SydCaption / SAAT

MIT License
62 stars 21 forks source link

How do I get the roi_feat for custom video data? #6

Closed dcahn12 closed 4 years ago

dcahn12 commented 4 years ago

In your misc/extract_feats_roi.py code, the return value of functions of coco_demo.run_on_opencv_image() are result, top_preds and top_roi_feats.

But, in the maskrcnn code you linked, only one value (result) is returned as shown in the below.

image

How do I get the roi_feat for custom video data?

SydCaption commented 4 years ago

Hi, you can modify related functions, e.g.

    def run_on_opencv_image(self, image, If_draw=True):
        """
        Arguments:
            image (np.ndarray): an image as returned by OpenCV

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        #predictions = self.compute_prediction(image)
        predictions, roi_feats = self.compute_prediction(image)
        top_predictions, top_roi_feats = self.select_top_predictions(predictions, roi_feats)

        result = image.copy()

        if self.show_mask_heatmaps:
            return self.create_mask_montage(result, top_predictions), top_predictions
        result = self.overlay_boxes(result, top_predictions)
        if self.cfg.MODEL.MASK_ON:
            result = self.overlay_mask(result, top_predictions), top_predictions
        if self.cfg.MODEL.KEYPOINT_ON:
            result = self.overlay_keypoints(result, top_predictions), top_predictions
        result = self.overlay_class_names(result, top_predictions)

        return result, top_predictions, top_roi_feats

and

    def select_top_predictions(self, predictions, roi_feats):
    #def select_top_predictions(self, predictions):
        """
        Select only predictions which have a `score` > self.confidence_threshold,
        and returns the predictions in descending order of score

        Arguments:
            predictions (BoxList): the result of the computation by the model.
                It should contain the field `scores`.

        Returns:
            prediction (BoxList): the detected objects. Additional information
                of the detection properties can be found in the fields of
                the BoxList via `prediction.fields()`
        """
        scores = predictions.get_field("scores")
        keep = torch.nonzero(scores > self.confidence_threshold).squeeze(1)
        predictions = predictions[keep]
        roi_feats = roi_feats[keep]
        scores = predictions.get_field("scores")
        _, idx = scores.sort(0, descending=True)

        return predictions[idx], roi_feats[idx]

@dcahn12

dcahn12 commented 4 years ago

Thank you for your quick response!

I think, there are some points that I don't still understand :(

First, As you can see in below picture, I just added x value from self.model() function at the function of compute_prediction() Is is this code right for the value of roi_feats ?

The code for compute_prediction() image

The code for model (GeneralizedRCNN) image

Second, When I check the bbox value in h5 file, the range of the bbox is not from 0 to 1 as shown in below picture.

image

If you give me more details for extracting roi_feat and bbox information, it would be very appreciated. :)

SydCaption commented 4 years ago

Sorry, I do not quite understand your problem. Do you mean that you could not get the right proposals for a given video? Just like the following? 9

The bbox value are not scaled by the size of the image frame, so it's not in [0, 1].

dcahn12 commented 4 years ago

Ah, I found that the range of bbox that you gave (msrvtt_foi_box.h5) was in [0, 1] so that I thought that the value was normalized, as you can see below picture. image

But, as you saw, when I tried to extract the bbox value, the range of the bbox value was not in [0, 1] :( Could you check one more about this problem?

And, as I said above, is the method I used for extracting roi_feat right?

SydCaption commented 4 years ago

Hi, I think it's right. Btw, the range can be normalized by the width and height of the image (from a single frame) and then you can normalize the bbox to [0,1]. e.g.

(w, h) = top_preds.size
bbox = top_preds.bbox.clone()
w_id, h_id = [0, 2], [1, 3]
bbox[:, w_id] = bbox[:, w_id]/w
bbox[:, h_id] = bbox[:, h_id]/h

@dcahn12

dcahn12 commented 4 years ago

Hi, thank you! bbox can be extracted by this method.

But, I don't know how to extract roi_feat from mask_rcnn code linked in this repo. Where do you extract the roi_feat value from mask_rcnn code? Could you check it ?

qizhust commented 4 years ago

Hi, it's in maskrcnn_benchmark/modeling/detector/generalized_rcnn.py you have post. @dcahn12

dcahn12 commented 4 years ago

Hi, but, while the dimension of given roi feature in msrvtt_roi_feat.h5 is [num_obj, 1000], the feature dimension in maskrcnn_benchmark/modeling/detector/generalized_rcnn.py ([num_obj, 14, 14]) is not the same with given roi feature file. :( So, I was confused concerning which feature should I use for extracting roi_feat.

SydCaption commented 4 years ago

You can check the size once again, which should be [num_obj, 1024]. Please refer to modeling/roi_heads/box_head/roi_box_feature_extractors.py for more details.

dcahn12 commented 4 years ago

Thank you very much!!! 👍