mAP predicted by student_model is sometimes higher than teacher

facebookresearch / adaptive_teacher

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".

Other

180 stars 35 forks source link

mAP predicted by student_model is sometimes higher than teacher #42

Open firekeepers opened 1 year ago

firekeepers commented 1 year ago

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD？

sysuzgg commented 1 year ago

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD？

这个问题可能是没固定随机种子，我也是训练自己的数据集，我想请教一下模型的训练过程，第一步：Trainer: baseline, 迭代10k次；第二步：Trainer: ateacher, 加载第一步训练的模型参数，接着迭代50k次？是这样的步骤吗？

firekeepers commented 1 year ago

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD？这个问题可能是没固定随机种子，我也是训练自己的数据集，我想请教一下模型的训练过程，第一步：Trainer: baseline, 迭代10k次；第二步：Trainer: ateacher, 加载第一步训练的模型参数，接着迭代50k次？是这样的步骤吗？

我试图固定了torch，numpy等随机种子，在config.py中也尝试对随机种子进行了固定，同时数据读取的随机种子保持为0，但是每次的结果相差甚远

firekeepers commented 1 year ago

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD？这个问题可能是没固定随机种子，我也是训练自己的数据集，我想请教一下模型的训练过程，第一步：Trainer: baseline, 迭代10k次；第二步：Trainer: ateacher, 加载第一步训练的模型参数，接着迭代50k次？是这样的步骤吗？

对于网络的训练步骤，在trainer.py中描述的较为详细，首先＜burn_in_iter时对模型进行训练，burn up阶段将网络复制到教师模型，然后对学生模型进行训练

我根据网络拟合情况对训练时间进行了多次调整，由于我的数据集域分布偏差较大，因此较小的burn-in阶段可能会带来更高的精度

sysuzgg commented 1 year ago

@firekeepers 请问你训练自己的数据集修改了什么参数，我用cityscapes的yaml配置训练自己的数据集，metrics.json中的结果比faster-RCNN source-only的结果差距20%左右，这个结果肯定不对，想请教一下哪些参数可以修改？

sysuzgg commented 1 year ago

@firekeepers 还有一个问题，我用VGG16训练可行，但是换为R101的时候，训练中总是会出现total_loss, loss_cls, loss_box_reg都为Nan，learning rate 从0.02，0.002，0.0002，0.00002都试过，训练时还是total_loss, loss_cls, loss_box_reg都为Nan

firekeepers commented 1 year ago

@firekeepers 请问你训练自己的数据集修改了什么参数，我用cityscapes的yaml配置训练自己的数据集，metrics.json中的结果比faster-RCNN source-only的结果差距20%左右，这个结果肯定不对，想请教一下哪些参数可以修改？

主要调整burn-in阶段的训练时常吧，我也不确定根据你的数据集如何调整

sysuzgg commented 1 year ago

@firekeepers 请问你训练自己的数据集修改了什么参数，我用cityscapes的yaml配置训练自己的数据集，metrics.json中的结果比faster-RCNN source-only的结果差距20%左右，这个结果肯定不对，想请教一下哪些参数可以修改？

主要调整burn-in阶段的训练时常吧，我也不确定根据你的数据集如何调整

好的，谢谢