I succeeded in reproducing your results on your dataset.
However, when it comes to my own traffic light dataset, which contains many small objects,
the localization heads starts to diverge.
I tried my best to turn down learning rate to 1e-7 or less,
The classification head converges, and works at test sets;
But the localization head diverges at loss=5 for four localizers (loss =NAN after that), and predict nonsense at test sets.
I succeeded in reproducing your results on your dataset. However, when it comes to my own traffic light dataset, which contains many small objects, the localization heads starts to diverge. I tried my best to turn down learning rate to 1e-7 or less, The classification head converges, and works at test sets; But the localization head diverges at loss=5 for four localizers (loss =NAN after that), and predict nonsense at test sets.
Any suggestions? Thanks in advacne