aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020
BSD 3-Clause "New" or "Revised" License
517 stars 136 forks source link

about batch grouping #6

Closed homelifes closed 4 years ago

homelifes commented 4 years ago

Hello. Thanks for your work on M2. I would like to ask regarding the data. Is the batch grouped according to the lengths of the caption/image features? What I mean is that does each batch contain all similar lengths of the caption? For example, a batch of 5 has the length of its captions as [16,16,16,16,16], which means that all captions in that specific batch have the same length. Do you have that in your code? (I'm asking because the original implementation of the Transformer in tensorflow has this, so i'm wondering if it's important to do it and has an effect on the performance). Thanks