Megvii-BaseDetection / DenseTeacher

DenseTeacher: Dense Pseudo-Label for Semi-supervised Object Detection

Apache License 2.0

119 stars 12 forks source link

coco-p1 training divergent #13

Open HaojieYuu opened 2 years ago

HaojieYuu commented 2 years ago

I tried to reproduce results under coco-p1 configuration, but training divergent after 40k steps and I got only 14% mAP which is far lower than 19.64%. Could you help me, please

ZRandomize commented 2 years ago

Training with such small amout of supervision is sensitive to hyper-parameters, please try batch 8 and logits weight 3

ZRandomize commented 2 years ago

just corrected the config in latest commit

HaojieYuu commented 2 years ago

Thanks for the reply, I will try the latest code

HaojieYuu commented 2 years ago

I just tried the latest config, and I add IMS_PER_DEVICE=1 to avoid the below assert. `def adjust_config(cfg): base_world_size = int(cfg.SOLVER.IMS_PER_BATCH / cfg.SOLVER.IMS_PER_DEVICE)

Batchsize, learning rate and max_iter in original config is used for 8 GPUs

assert base_world_size == 8, "IMS_PER_BATCH/DEVICE in config file is used for 8 GPUs"`

But the training still diverged after about 40k steps. I got higher result 16% mAP, but it's still much lower than 19.64%. I notice that coco-p1 don't use multiple-scale training, will that influence the final result?

ZRandomize commented 2 years ago

Indeed... Thanks for correction, I'll fix it. The multi-scale training would affect performance a lot, please use SUPERVISED=(WeakAug,dict(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style="choice")), to align with previous works like Unbiased Teacher; or SUPERVISED=(WeakAug,dict(short_edge_length=(640, 800), max_size=1333, sample_style="range")), for higher performance

HaojieYuu commented 2 years ago

Thanks for your reply. I get 18.49 mAP now, but it's still 1 point lower than the score presented in paper(19.64±0.34). Can this fluctuation in the result be considered normal? Besides, I noticed that both the student model and the teacher model are evaluated twice, but I can't find where the problem is. This problem can be reproduced with the latest code, could you please help?

ZRandomize commented 2 years ago

seems its close, our curve looks like this: We make the model evaluate both teacher and student every 2k iter, and we report the performance of the teacher

HaojieYuu commented 2 years ago

Thanks for your detailed reply! In my situation, the inference is carried out 4 times every 2k iter, not 2 times. Both the teacher and student models are evaluated twice which is bizarre. I didn't modify the code, could you reproduce this problem with the official code?