Closed 1125178969 closed 5 months ago
Thank you for your attention! In fact, I have tried pre training weight loading for MAE before, and as you have obtained, the effect is worse than the pre training results for Imagenet. However, I am not sure to what extent you said it was very poor. In my previous impression, the weight loading results for the two pre training methods on RGBNT201 were 10 points different. I speculate that this result is due to the fact that pre training with MAE can only capture some structural information. Perhaps true labeled pre training can help the model learn higher-level decision information, which is a similar phenomenon in many tasks.
But perhaps your MAE pre training weights were not loaded correctly, and the learning rate under this pre training parameter may not have been adjusted. Besides, your observation is very detailed. I didn't notice that the initial loss of MAE was lower, but I think you need to conduct multiple experiments to see if the initial loss is indeed lower, as this may be related to many random fluctuations.
In addition, you can try the big dataset rgbnt100 to observe if similar situations occur, as rgbnt201 is too small and may have large fluctuations. If the initial loss is indeed lower, I guess it is because MAE imitated the key factor of occlusion, which is a common challenge in reid and thus helpful for the initial learning stage. However, the analysis of the final results is not yet clear. I hope the above answer can help you!
Thank you for your answer, it is beneficial and your work is useful for me to understand the multi-modal ReID
The training start Loss is smaller than when loading imagenet pre-training weights, but the validation set accuracy is much lower than when loading imagenet pre-training weights