encounter1997 / SFA

Official Implementation of "Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers"
Apache License 2.0
93 stars 7 forks source link

Bad results in the task of city2city_foggy #4

Closed pmz1997 closed 2 years ago

pmz1997 commented 2 years ago

Thanks for amazing work. I have a problem when i run the demo in the task of city2city_foggy, the map of the results is only 21% after 24 epochs, which is much lower than the result in the paper. It seems that the model didn't converge. Since I use only one GPU, i didn't use the distributed training mode. I simply ran main_da.py --hda 1 --cmt --with_box_refine --two_stage, and i changed the lr and lr backbone to 5e-5 and 5e-6 because my batch_size is 1. i also tried lr =1e-4 and lr backbone=1e-5, however it seems that the result is still bad. I wonder whether there is a pretrained model used in deformable detr.

encounter1997 commented 2 years ago

Hi, thank you for your interest.

Firstly, following existing domain adaptive object detection methods, we adopted AP50 as the evaluation metric. The AP is a much more strict metric than AP50, thus it should be lower than AP50. Please track the AP50 performance for a fair comparison. Secondly, all our experiments are implemented with a batch size of 4, thus I would suggest you try SFA using the same batch size. A batch size of 1 is probably too small for training detection transformers. Thirdly, both SFA and the source-only Deformable DETR are trained with ImageNet-1K pre-trained ResNet50 backbone. The transformer part, however, is trained from scratch, as did in the original Deformable DETR.

Hope these answers could help you.

Pandaxia8 commented 2 years ago

Thanks for amazing work. I have a problem when i run the demo in the task of city2city_foggy, the map of the results is only 21% after 24 epochs, which is much lower than the result in the paper. It seems that the model didn't converge. Since I use only one GPU, i didn't use the distributed training mode. I simply ran main_da.py --hda 1 --cmt --with_box_refine --two_stage, and i changed the lr and lr backbone to 5e-5 and 5e-6 because my batch_size is 1. i also tried lr =1e-4 and lr backbone=1e-5, however it seems that the result is still bad. I wonder whether there is a pretrained model used in deformable detr.

Single GPU will not work which is similar to DeformableDETR.You should try larger batch_size, but larger batch_size employed in Single GPU may be out of Cuda memory