Closed qiaoran-dawnlight closed 3 years ago
Hello,
Thanks for the question. A few weeks ago I was about to remove that part since it is never used by the training pipeline (that used fixed-sized input images). However, as mentioned here https://github.com/Visual-Behavior/detr-tensorflow/issues/10 it might be a good feature to have an alternative training pipeline with all the images padded as in the original implementation.
If the feature is useful and needs to be implemented I will go back through that code to check if everything is working properly with padded images.
Thibault
Okay, thanks for the explain. So no special reason :) I think the non-fix size training is actually one of key features of the original one (it might not seems highlight in paper), as the authors specially using the mask. This feature will make this project more suitable for other dataset or training from scratch.
Hi, i noticed you not using the
key_padding_mask
is theMultiHeadAttention
. Which means themask
are not using in the Transformer? Mu guess is, from the original author, themask
have the img original size info before padding, so the transformer would know which part is real img and which part is padding from themask
. But you have fixed size input, transformer no need worried the img padding. If in that case, why not keep them? Any special reason?