Closed Fedaman-LiFi-ken closed 2 months ago
I would also like to ask, with yolo training, does the label of the dataset only contain entities, and how is the relationship between them trained?
Hello, I will give you a quick explanation of SGG, it is a multi-steps task that works as follows: SGG is usually trained in two steps: first use one dataset with bounding boxes annotations to train an object detector and second use a scene graph dataset (bounding boxes + relations) to train a relation prediction model. Traditionally both datasets are the same, the object detector being trained on the bounding boxes that will serve later for the relation prediction stage. However, in my implementation, I introduce a new backbone (YOLOV8 or Yolo-World) that can be trained separately (using the official ultralytics implementation) with another dataset, making sure that classes are similar. For Yolov8 you can have a look at my script to convert annotations of an SGG-type dataset to yolo to make sure classes are similar, but you can also use any other Object Detection dataset such as coco etc. For Yolo-World you don't even need a dataset as you can just use my other script to encode the label classes with CLIP.
Now regarding the relations prediction dataset (Scene Graph annotations), the easiest way for you to build your own dataset is to use a tool such as this one to annotate your custom images. Make sure in the process of having the same object classes as for your object detection training. The more images you have the better your model will be, around 10 000 images will give good results but you can start with 1 000 etc.
Finally, you can use this codebase to train an SGG model for relation prediction, to do so you will have to modify the path_catalog.py file to add your custom dataset and do according changes to your .yaml config file (mainly changing the keys ROI_BOX_HEAD.NUM_CLASSES
and ROI_RELATION_HEAD.NUM_CLASSES
).
Good luck and let me know how it goes!
Closing due to inactivity
Hello, I'm a beginner in SGG, can you give a more detailed explanation of the detailed steps of dataset labeling and training, and how to use the code part of the visualization graph. I'd like to know exactly how you train