HKUST-LongGroup / CFA

[ICCV'2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation
MIT License
13 stars 2 forks source link

Regarding the pre-trained model that extracts the feature augmentations #4

Closed KanghoonYoon closed 9 months ago

KanghoonYoon commented 9 months ago

I really appreciate your work, which gives me a lot of insight. I succeeded in reproducing the result of your work. But I have remaining questions (these questions may be the last..)

First, I downloaded the feature files for the memory bank (e.g., sgcls_body_feature_with_proposal_dict_motf). In the name of the file, the task 'sgcls' is included. Should I generate different features for each task to perform CFA, i.e., predcls_feature, sgdet_feature ?

Second, does CFA require a pre-trained SGG model for each task, or just pre-trained Faster R-CNN?

I thought CFA needs pre-trained motif models for predcls, sgcls, and sgdet, respectively, to extract the union features. However, in the below command you offered, the pre-trained_detector_ckpt is in the faster_rcnn directory, which confuses me.

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10032 --nproc_per_node=1 tools/generate_aug_feature.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor  TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR glove MODEL.PRETRAINED_DETECTOR_CKPT checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR exp/motif-precls MIXUP.FEAT_PATH feats TYPE extract_aug

Again, I really appreciate your kind and professional responses. Thank you :)

muktilin commented 9 months ago

Thank you for affirming my work. It is not necessary to extract feature for each task. I initially used the predcls task to extract feature. However, when processing the following two tasks (sgcls and sgdet), the object category of proposal is required, so the prediction of proposal feature and raw object category under sgcls task is extracted again. The feature extraction of CFA only needs pre-trained Faster R-CNN.

KanghoonYoon commented 9 months ago

Thank you for your answer! Now, I understand your point :)

And... could you explain the meaning of the if statement that np.any(np.array(rel_labels[i].cpu()<0) ?

image

Due to the if statement, foreground mixup is not performed, while only background mixup is continuously performed during the training procedure because there are no instances that have rel_labels < 0. (Here, I performed the code under the sgcls task, following your guidelines).

It is strange to me that the foreground mixup is not performed because the foreground mixup is also important to CFA, according to your ablation study.

I know that rel_labels < 0 (or rel_labels = -1) is used in the bi-level sampling in [1] to represent the dropped triplets. However, in your code, you just oversampled the images and did not drop the triplets.

The clarification for rel_labels = -1 or rel_labels < 0 will be helpful for my understanding.

[1] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation. CVPR'21.

Once again, thank you for your time and responses. Please understand that my numerous questions stem from a deep curiosity for your great work.

muktilin commented 9 months ago

We won't drop the triplets. We set rel label to a negative number (-rel_label) to select those foreground samples and do the mixup. If rel_labels < 0, we will operate the mixup operation to this triplet. Sorry, due to the hasty release of the code, I did not write a complete code comment.

KanghoonYoon commented 9 months ago

Thank you for the explanation. However, when I ran the code, I confirmed that there are no rel_labels < 0, and, thus, the code never goes into the if statement I mentioned above. Also, I could not find the code that sets rel_label to negative numbers in the current version of the code.

muktilin commented 9 months ago

During training, it will run self.union_feature_extractor twice in Line 132 and 135 in relation_head.py, you may print on the first call (Line 132). Besides, the most rel_labels will be changed in Line 594 in roi_relation_feature_extractors.py (FG_TAIL operation), you can print in Line 592 in roi_relation_feature_extractors.py. The code that sets rel_labels to negative numbers is in line 250 in visual_genome.py.

KanghoonYoon commented 9 months ago

I have tracked the Line 594 in roi_relation_feature_extractors.py by setting the debugging point. But, the code cannot go into the Line 594 because there are no rel_labels < 0.

I newly confirmed that, before the first call of self.union_feature_extractor (Line 132), there exist rel_label < 0. However, after the first call of self.union_feature_extractor, all rel_label < 0 become zeros. This is because the first call of feature extractor replaces rel_labels < 0 with zeros (probably without deep copy function) in Line 686-Line 690 of roi_relation_feature_extractors.py. Hence, at the second call related to FG TAIL operation, we do not have the rel_label < 0, which is the target performing the FG TAIL operation

muktilin commented 9 months ago

Thank you for reminding me. Due to my carelessness, the deep copy function is missing, so only EX_bg and IN are executed, I will fix it latter. The result of PredCls should be no problem and do FG_TAIL because there is no contrastive learning required (CONTRA is False). : )

KanghoonYoon commented 9 months ago

Thank you for your time and effort for this discussion. Then, I will close my issue :).

I will be grateful if you update the script command for SGDET and PredCls in Readme.md since I currently cannot identify how to set the config file for other tasks (e.g., CONTRA=False for Predcls !).

I mean this part:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10032 --nproc_per_node=1 tools/generate_aug_feature.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor  TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR glove MODEL.PRETRAINED_DETECTOR_CKPT checkpoints/pretrained_faster_rcnn/model_final.pth OUTPUT_DIR exp/motif-precls MIXUP.FEAT_PATH feats TYPE extract_aug