Parameter modification for smaller batch_size

bei181 commented 3 years ago

Your work is indeed impressive! but how should I modify my parameters for smaller batch size (8)?
I tried the default settings to train the coco standard dataset but do not get the same result presented in the paper. Thanks a lot!

ycliu93 commented 3 years ago

Hi, I have only tried to use the batch size = 16 (16 labeled data and 16 unlabeled data) on 8 gpus. And it could get 19.9 mAP under 1% setting. Could you provide the experiment results of your running?

bei181 commented 3 years ago

Hi, this is the parameters of my experiment and my result :

By the way, is there a mistake in the figure that an arrow from RPN of teacher model should be added？

Thank you!

happyxuwork commented 3 years ago

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. 20210326-092159(WeLinkPC) ， is normal？ is any config should altert?

bei181 commented 3 years ago

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

happyxuwork commented 3 years ago

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

wow, it works, thanks

bei181 commented 3 years ago

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

wow, it works, thanks

Hi, what is your final results? Do you get the AP in the paper?

ycliu93 commented 3 years ago

Hi @bei181, using 6+6 images per batch should get much better results than 1.98 mAP. I just found out Detectron2 updated the weight loading, which makes the previous unbiased teacher codebase unable to load the pretrained ResNet model weight. This might make the results very low.

I just fixed the problem and push the commit. Could you update the unbiased teacher codebase to the latest one, and try it again? Let me know if the issue is still. Thanks!

happyxuwork commented 3 years ago

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

@bei181 @ycliu93 i set use the default as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16 but from 19 steps ,The following information appears. ， is normal？ is any config should altert?

it is not normal, you should set your learning rate smaller. I set 0.001.

wow, it works, thanks

Hi, what is your final results? Do you get the AP in the paper? i use coco stand ,10% lable, 90% unlabel, batch_size=16, The rest of the settings are the same as you， finally i get AP:10.681 AP50:22.033; next i will update the teacher codebase just as ycliu93 said, and try it again.

happyxuwork commented 3 years ago

Hi @bei181, using 6+6 images per batch should get much better results than 1.98 mAP. I just found out Detectron2 updated the weight loading, which makes the previous unbiased teacher codebase unable to load the pretrained ResNet model weight. This might make the results very low.

I just fixed the problem and push the commit. Could you update the unbiased teacher codebase to the latest one, and try it again? Let me know if the issue is still. Thanks!

Hi @bei181, using 6+6 images per batch should get much better results than 1.98 mAP. I just found out Detectron2 updated the weight loading, which makes the previous unbiased teacher codebase unable to load the pretrained ResNet model weight. This might make the results very low.

I just fixed the problem and push the commit. Could you update the unbiased teacher codebase to the latest one, and try it again? Let me know if the issue is still. Thanks!

@ycliu93 after update the unbiased teacher codebase to the latest one, and use the config file https://github.com/facebookresearch/unbiased-teacher/blob/05dad84c8e1bb44c6fd14706571ab0769143e48d/configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml ,In order to avoid Nan when training, i alter the learning rate to 0.001, beside, batch_size=16, due to memory limitations；when trainging done ,i get AP:28.12 AP50:49.692； compare to you paper, have about 3 point gap, do you have some Suggestions ？the same problem alse happen when training use VOC dataset.

ycliu93 commented 3 years ago

Hi @happyxuwork,

Using a lower learning rate is likely to make the EMA teacher model performs worse than expected. I would keep the learning rate as the original learning rate and first reduce the unsupervised weight (reduce to 3 or 2).

bei181 commented 3 years ago

Hi @bei181, using 6+6 images per batch should get much better results than 1.98 mAP. I just found out Detectron2 updated the weight loading, which makes the previous unbiased teacher codebase unable to load the pretrained ResNet model weight. This might make the results very low.

I just fixed the problem and push the commit. Could you update the unbiased teacher codebase to the latest one, and try it again? Let me know if the issue is still. Thanks!

Hi @ycliu93 , I tried your new code and get the following results: The mAP is much higher than before. Thank you! But is there any good idea to improve the performance since you said you get 19.9 mAP with batch 16? Here are my records in tensorboard:

happyxuwork commented 3 years ago

Hi @happyxuwork,

Using a lower learning rate is likely to make the EMA teacher model performs worse than expected. I would keep the learning rate as the original learning rate and first reduce the unsupervised weight (reduce to 3 or 2).

Hi @ycliu93 accroding to the https://github.com/facebookresearch/unbiased-teacher/issues/5 and the above advice you gave， i alter the setting of my config when training the model using 100% VOC07 as label and 100% VOC12 as unlabel, The complete configuration is as follows:

BASE: "../Base-RCNN-FPN.yaml" MODEL: META_ARCHITECTURE: "TwoStagePseudoLabGeneralizedRCNN" WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl" MASK_ON: False RESNETS: DEPTH: 50 PROPOSAL_GENERATOR: NAME: "PseudoLabRPN" RPN: POSITIVE_FRACTION: 0.25 LOSS: "CrossEntropy" ROI_HEADS: NAME: "StandardROIHeadsPseudoLab" LOSS: "FocalLoss" SOLVER: LR_SCHEDULER_NAME: "WarmupMultiStepLR" STEPS: (179990, 179995) MAX_ITER: 180000 IMG_PER_BATCH_LABEL: 32 IMG_PER_BATCH_UNLABEL: 32 BASE_LR: 0.001 INPUT: MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) MIN_SIZE_TEST: 800 DATALOADER: SUP_PERCENT: 100.0 RANDOM_DATA_SEED: 1 DATASETS: CROSS_DATASET: True TRAIN: ('voc_2007_trainval', 'voc_2012_trainval') TEST: ('voc_2007_test',) TRAIN_LABEL: ('voc_2007_trainval',) TRAIN_UNLABEL: ('voc_2012_trainval',) SEMISUPNET: Trainer: "ubteacher" BBOX_THRESHOLD: 0.7 TEACHER_UPDATE_ITER: 1 BURN_UP_STEP: 2000 EMA_KEEP_RATE: 0.9996 UNSUP_LOSS_WEIGHT: 2.0 TEST: EVAL_PERIOD: 1000 EVALUATOR: "COCOeval"

OUTPUT_DIR: ./output/voc07_sup100_voc12_unlabel

Is the above configuration consistent with yours？ If possible, can you provide the configuration of model which train use 100% VOC07 as label and 100% VOC12 as unlabel in your paper? Looking forward to your reply.

happyxuwork commented 3 years ago

Besides, can you provide all configurations of model train on VOC, just as the Table 3 in your paper:

ycliu93 commented 3 years ago

Yes, it is similar to my configuration file for VOC. You could increase the learning rate to 0.01. I would update the VOC config files and model weights once they are prepared.

facebookresearch / unbiased-teacher

Parameter modification for smaller batch_size #4