Shengcao-Cao / CMT

[CVPR 2023] CMT: Contrastive Mean Teacher for Domain Adaptive Object Detectors
Apache License 2.0
37 stars 6 forks source link

source-only training for AT and PT models #9

Closed jetyang0729 closed 5 months ago

jetyang0729 commented 7 months ago

I am interested in understanding the methodology and best practices for incorporating source-only training into the training process for these models.

Additionally, I noticed in your paper the reference to vgg16bn-6c64b313converted.pth, which appears to be a classification model. If I intend to utilize your model in other contexts, I am uncertain about the appropriate handling of this model. Could you please advise on how to manage this model for use in different settings?

Furthermore, I would appreciate clarification on whether source-only training should be integrated into your fasterrcnnVGGcrosscity.yaml file, or if it should be conducted separately to train vgg16bn-6c64b313converted.pth.

Your insights and guidance on these matters would be immensely valuable to me. Thank you in advance for your time and consideration. I eagerly await your response.

Shengcao-Cao commented 7 months ago

Hello @jetyang0729 ,

  1. In both AT and PT, the "source-only training" is termed as a "burn-in" stage. During this time, only the source domain labeled data will be used for training the student object detector. For example, you may check the related code here: https://github.com/Shengcao-Cao/CMT/blob/029cd21d0d77252d1309f87b83c05d1752055e56/CMT_AT/adapteacher/engine/trainer.py#L661-L673

  2. vgg16bn-6c64b313converted.pth is basically the ImageNet pre-trained VGG model from torchvision. I think I just changed the names of the parameters so that they are compatible with the code in AT.

Best, Shengcao

jetyang0729 commented 7 months ago

@Shengcao-Cao Hello,

I am very excited to receive answers to these questions from you. From your response, I realize that the training method of AT may be different from what I previously understood about DA models. It seems that AT does not require separate source-only training. Instead, it appears that AT directly conducts both source-only training and Cross-Domain Adaptive training simultaneously. Could you please confirm if my understanding is correct?

If my understanding above is correct, then is the final improvement in cross-domain achieved through the train results of student model and teacher model? How are the labels for the unlabeled dataset in the data set and the labels in the test set obtained?

Thank you very much!

Best wishes, Jintao

Shengcao-Cao commented 7 months ago

Hello @jetyang0729 ,

Source-only training and cross-domain adaptive training do not happen "simultaneously." The "source-only" (or "burn-in") training stage is before the teacher-student cross-domain training. I believe that the term of "burn-in" comes from the previous work Unbiased Teacher (https://github.com/facebookresearch/unbiased-teacher/tree/main).

The cross-domain performance improvement indeed comes from the teacher-student mutual learning. In that procedure, the target domain images are unlabeled; the pseudo-labels come from the teacher model's predictions. For the test set, the labels are ground truths, annotated by human annotators.

Best, Shengcao

jetyang0729 commented 7 months ago

hello @Shengcao-Cao

Thank you so much for your enlightening response! Your explanation has truly clarified things for me.

Now, I have a clearer understanding of your work:

  1. Train the Unbiased Teacher under 10% COCO-supervision using the following code:
    python train_net.py \
      --num-gpus 8 \
      --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml \
       SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16
  2. Then utilize the trained model to predict pseudo-labels for the target domain images.
  3. Without using a pre-trained model, directly employ the source domain images and labels, target domain images and pseudo-labels, and test set with manually annotated ground truth labels.
  4. For comparing the improvement between them, compare the student model from step 3 with the teacher model from step 1.

I'd like to confirm if my understanding is correct? Additionally, why is it necessary to set SOLVER.IMG_PER_BATCH_UNLABEL=16 in step 1? Also, do I need to set the backbone of the model to build_vgg_backbone?

Thank you very much for your assistance!

Best regards, Jintao

Shengcao-Cao commented 7 months ago

Hello @jetyang0729 ,

To clarify, you do not need to separately train the student model with source-only supervision in Unbiased Teacher, Adaptive Teacher, Probabilistic Teacher, or our work. The code integrates the whole training process together. Unbiased Teacher was initially proposed for semi-supervised object detection, not domain adaptive object detection. However, the setting is similar, so AT and PT both adopted the framework in their code.

The usage in README (https://github.com/Shengcao-Cao/CMT?tab=readme-ov-file#usage) already introduces how to train the domain-adaptive object detection model from the very beginning (when you only have the pre-trained backbone). For instance, you may check this configuration file (https://github.com/Shengcao-Cao/CMT/blob/main/CMT_AT/configs/faster_rcnn_VGG_cross_city.yaml) and find how BURN_UP_STEP IMG_PER_BATCH_LABEL IMG_PER_BATCH_UNLABEL TRAIN_LABEL TRAIN_UNLABEL are set. In this example, the detector is trained with only labeled data from cityscapes_fine_instance_seg_train for the first 20000 steps. After that, the teacher-student model is built, and they learn from both labeled data cityscapes_fine_instance_seg_train and unlabeled data cityscapes_foggy_train to achieve domain-adaptive object detection. The code takes care of the whole procedure, and you do not need to worry about switching models, predicting pseudo-labels, etc.

Best, Shengcao

jetyang0729 commented 7 months ago

Thanks a lot! @Shengcao-Cao Your response has truly shed light on the matter! Once again, I sincerely appreciate your guidance.

So, for the training of CMT, I don't need to prepare any pseudo-label data. During the registration of unlabeled data, I only need to provide an empty dataset to complete the entire model training. Regarding the performance comparison of DA models, I only need to focus on the performance gap between the student model and the teacher model. Is my understanding correct?

Furthermore, I've been recently experimenting with training detectron2 models on my own data. However, I've noticed that without appropriate weights, the model fails to converge. If I use something pre-trained model similar to R50.kpl during the training of CMT, would this affect the comparison of the model's actual performance?

Thank you very much for your guidance. Jintao

Shengcao-Cao commented 7 months ago

Hello @jetyang0729 ,

Yes, I believe your understanding is correct. Regarding the DA performance comparison, we often compare the student before cross-domain training (which is the "source-only" model) with the final teacher model after cross-domain training.

If you need to train on your own data, you may need to adjust models, hyper-parameters, etc. I think it should be fine to use another pre-trained backbone like R50.pkl. Just make sure to make necessary modifications to the code, so that it can correctly handle your own data.

Thanks, Shengcao