Need some help for experiments on other dataset.

heiyuxiaokai commented 2 years ago

I change some configs and use other dataset. The size of my dataset is relatively small. So I reduce the total iterations to 10k. For test results, the mAP of denseteacher is much lower than corresponding supervised model(relative to the coco dataset). There may be several factors:

The total iterations are too small for model convergence.
The size of dataset is too small.
The strategy of unlabel image sampling affects a lot.

I do not kown which is the main factor. Can you give me some advice? Here are some logs.

2022-10-07 21:39:25.569 | INFO | cvpods.utils.dump.events:write:253 - eta: 0:18:44 iter: 38780/40000 total_loss: 1.656 loss_cls: 0.143 loss_box_reg: 0.181 loss_quality: 0.426 distill_loss_logits: 0.421 distill_loss_quality: 0.412 distill_loss_deltas: 0.141 fore_ground_sum: 171.450 time: 0.8847 data_time: 0.0075 lr: 0.002500 max_mem: 5055M
2022-10-07 21:39:34.673 | INFO | cvpods.utils.dump.events:write:253 - eta: 0:18:35 iter: 38790/40000 total_loss: 1.792 loss_cls: 0.115 loss_box_reg: 0.158 loss_quality: 0.406 distill_loss_logits: 0.600 distill_loss_quality: 0.416 distill_loss_deltas: 0.174 fore_ground_sum: 175.650 time: 0.8847 data_time: 0.0084 lr: 0.002500 max_mem: 5055M
2022-10-07 21:39:44.712 | INFO | cvpods.utils.dump.events:write:253 - eta: 0:18:26 iter: 38800/40000 total_loss: 1.694 loss_cls: 0.121 loss_box_reg: 0.164 loss_quality: 0.413 distill_loss_logits: 0.508 distill_loss_quality: 0.416 distill_loss_deltas: 0.156 fore_ground_sum: 163.533 time: 0.8847 data_time: 0.0119 lr: 0.002500 max_mem: 5055M

ZRandomize commented 2 years ago

thanks, we do have some experience on data scaling. Firstly, if your sup dataset is small and training time is short, consider using smaller burn_in_step so the teacher shouldn't overfit when SSL begin. If you want to check whether training has converged, check if the mAP is stable, a converged mAP curve is like:

Secondly, the TRAINER.DISTILL.RATIO should be changed for different dataset, this hyper-param is sensitive to dataset as introduced in our paper, and its optimal value can be estimated using dataset. Besides, we don't see the effect of different unsup data sampling on coco, it won't matter if you have enough unsup data.

heiyuxiaokai commented 2 years ago

Thanks for your sincere advice~ I tried several seeds of sampling sets and got some normal results(similar with experiments on coco). It seems my small dataset and the imbalance category distribution lead extreme sampling sets. For example, results appear normal with seed5 and appear strange with seed1.

Megvii-BaseDetection / DenseTeacher

Need some help for experiments on other dataset. #15