KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.03k stars 228 forks source link

Use my own object detection model instead of Faster-RCNN #201

Open Li-XD-Pro opened 9 months ago

Li-XD-Pro commented 9 months ago

❓ Questions and Help

Thanks for you code. And I want to know how to use my own object detection model instead of Faster-RCNN?

lhn712836 commented 6 months ago

Have you solved this problem? I also want to use my own object detection model

Maelic commented 4 months ago

Hi everyone, It is fairly easy to do, you just have to remove the rpn module and replace it with your own, i.e. changing the line RPN init with the initialization of your own object detector, for instance if using yolov8 it could be something like that: self.yolo_detector = YOLO(cfg, your_parameters...) Then you will have to load the yolo weights somewhere with model.yolo_detector.load(your_weights_path) and finally change the line accordingly: Rpn Module Forward

You'll need then to post-process the output of your detector (in this case yolov8) to be of the same format of the original proposals, which is the BoundingBox class here

In my approach, I did something like this (I use the proposals as targets as I'm performing SGG in PredCLS mode):

features = self.backbone(images.tensors)

results = self.yolo_detector(images) # forward pass, be careful you'll also need to modify the input format to fit the one of your detector
boxes = results[0].boxes.xyxy.cpu().numpy().astype(int)
class_names = results[0].names
classes = [class_names[c] for c in results[0].boxes.cls.cpu().numpy().astype(int)]

w, h = img_list.image_sizes[0][1], img_list.image_sizes[0][0]

targets = BoxList(boxes, (w, h), mode="xyxy")
targets = targets.resize((800, 600))

target_classes = [STATS["label_to_idx"][str(c)] for c in classes]
targets.add_field("labels", torch.Tensor(target_classes))

if self.roi_heads:
            x, result, detector_losses = self.roi_heads(features, targets, targets, logger)

The reshape to 800,600 is because the original RPN outputs the proposals in this shape.

Note that this process doesn't change the features extractor, only the bounding box regression and classification heads so you will still have to train and load a Faster-RCNN to extract the ROI features (which is done in the box_features_extractor class).

Hope this helps