Hello. Thanks for your work on M2.
I would like to ask regarding the data.
Is the batch grouped according to the lengths of the caption/image features? What I mean is that does each batch contain all similar lengths of the caption? For example, a batch of 5 has the length of its captions as [16,16,16,16,16], which means that all captions in that specific batch have the same length. Do you have that in your code? (I'm asking because the original implementation of the Transformer in tensorflow has this, so i'm wondering if it's important to do it and has an effect on the performance).
Thanks
Hello. Thanks for your work on M2. I would like to ask regarding the data. Is the batch grouped according to the lengths of the caption/image features? What I mean is that does each batch contain all similar lengths of the caption? For example, a batch of 5 has the length of its captions as [16,16,16,16,16], which means that all captions in that specific batch have the same length. Do you have that in your code? (I'm asking because the original implementation of the Transformer in tensorflow has this, so i'm wondering if it's important to do it and has an effect on the performance). Thanks