Maelic / SGG-Benchmark

A New Benchmark for Scene Graph Generation, targeting real-world applications
MIT License
43 stars 5 forks source link

YOLOv8 train #28

Open Bobbypower opened 1 month ago

Bobbypower commented 1 month ago

Hello Dr. Maëlic, I am new to the SGG task. When applying your project to my dataset, I was able to obtain some results using the predcls mode. However, when I switched to the sgdet mode, the mAP remained 0 throughout the training from start to finish, and the model seems to have detected nothing. I have used YOLOv8 to train my own dataset.

e2e_relation_yolov8m_squat.txt

Could you please provide me with any insights or suggestions that might help me resolve this issue? sgdet
1725890848313

Thank you

Bobbypower commented 1 month ago

python tools/relation_train_net.py --task predcls --config-file "configs\MyPSG\e2e_relation_yolov8m_squat.yaml" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 1 DTYPE "float16" GLOVE_DIR /mli/v24 TEST.INFORMATIVE False OUTPUT_DIR ./checkpoints/mypsg_squat_1

Maelic commented 1 month ago

Hi, you need to use NMS_THRESH: 0.001 when training for sgdet, if you don't, it is very likely that your object detector won't output anything. You can change this variable only for inference with a trained model. The variables MIN_SIZE_TRAIN and MAX_SIZE_TRAIN are also not correct, they should be similar to MIN_SIZE_TEST and MAX_SIZE_TEST, and the same as your yolov8 training (by default yolov8 is trained with resolution 640). Finally, the variable BASE_LR: 0.001 seems very low for a training with only 6 classes, depending on your data distribution you may higher that by a x10 or x20 factor. These are the mistakes I can quickly identify from your config file.

Bobbypower commented 1 month ago

Thank you for your response and for pointing out my mistake. I will try the approach you recommended. I have one more question: I came across a file named e2e_relation_detector_yolov8.yaml. After training with YOLOv8, do I still need to use this file for additional training? I read some of Kaihua's documentation, and it left me a bit confused.

Maelic commented 1 month ago

No you don't need this file, it is for testing only. If you want to test your yolov8 trained model with my codebase you can by using this config file with the https://github.com/Maelic/SGG-Benchmark/blob/main/tools/detector_pretest_net.py test script. But it is not very useful and you may not consider it.

Bobbypower commented 1 month ago

Hello Dr. Maelic, Thank you for your guidance. I have achieved good results on my dataset. I would like to ask you, if I proceed with further experiments using the modified YOLOv8, should I modify the out_channels to be consistent with the modified YOLO structure? Also, I noticed that the feature extraction uses layers 15, 18, and 21 of YOLO. Should I also ensure consistency in this regard? And what about the size of the feature maps? Additionally, I noticed the YOLOPooler in the code files, but it seems to be commented out. I am currently studying the code and examining the implementation details, and so far I haven't found any other places where YOLOv8 is coupled.

Thank you once again for your support. If I am able to publish a paper in the future, I will definitely cite your great work.

Maelic commented 1 month ago

Hello, Yes you need to modify out_channels if you change the yolov8 version, as follows:

The layers 15, 18 and 21 correspond to the outputs of the PAN-FPN module of Yolov8, you can have an overview of the architecture here. In my opinion, extracting features from these layers after the last C2f modules before the Detection Head of YOLOV8 is the most efficient approach. I guess you could change this and extract features from other layers, for instance straight out of the backbone with layers 4, 6 and 9 but I think this will lead to worse results. The PAN-FPN part of yolov8 (called the "Neck") is responsible for enhancing the features to represent bounding boxes information at 3 different scales, to better detect small (20x20 px), medium (40x40 px) and large (80x80 px) objects. If you extract features before or after that step, there is a high chance that your representation will be worse. You can have a read of the original FPN paper and the original PAN paper to understand the overall process of features extraction of YOLOV8.

Because YOLOV8 is slightly different from classical Faster-RCNN with FPN, I had to change a bit the Pooler. I created a YOLOPooler class to pool features at fixed scales, the 20x20, 40x40 and 80x80 that I just mentionned (see https://github.com/Maelic/SGG-Benchmark/blob/73c2ecbc54b2f5ddff834991ce905eae1590e5b2/sgg_benchmark/modeling/poolers.py#L236C1-L242C1). However, this doesn't seem to be better than the original FPN equation (see https://github.com/Maelic/SGG-Benchmark/blob/main/sgg_benchmark/modeling/poolers.py#L40-L43) and I had trouble to make it worked. Ultimately, I didn't use my YOLOPooler class for this reason.