facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.64k stars 7.5k forks source link

How to obtain the Bounding Box Co-ordinates of any predicted Object in the Image #1519

Closed vim5818 closed 4 years ago

vim5818 commented 4 years ago

Hello all, I would like to get the Co-ordinates of Bounding Box of a particular predicted object in the image. For example in the below mentioned link, the image has different objects detected by Detectron2 like cyclists, bottle, person,etc Detectron2 image at source

What output I am expecting

I would like to get the Co-ordinates of bounding box of the 2 water bottes fixed on the bicycle frame. Maybe store as text file to infer later or print them to understand which Co-ordinates of bounding box belongs corresponds to which object. As we have many objects in a single image, I would like to print the list of objects detected along with the co-ordinates of Bounding Box.

Thank you in advance.

ppwwyyxx commented 4 years ago

See tutorial: https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5

vim5818 commented 4 years ago

@ppwwyyxx Thank you for the reply, I have checked the tutorial on GoogleCoLab. As per the Segment: "Run a pre-trained detectron2 model", I am able to visualise the Information of the bounding boxes. But, I do not see such variable or line of Code in cloned repository of detectron2. After a complete search across different executable file and Folders , i dont see any exact line of Code as mentioned in colab tutorial.

Please support. Thank you.

ppwwyyxx commented 4 years ago

The tutorial shows how to "print the list of objects detected along with the co-ordinates of Bounding Box." in https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=7d3KxiHO_0gb as you asked.

Tutorials show how a user can use detectron2, so the content does not need to be part of the repository.

vim5818 commented 4 years ago

Screenshot 2020-06-03 at 10 45 23 AM

vim5818 commented 4 years ago

@ppwwyyxx . Thank you once again. I understand my query should have been correctly framed. I would like to reframe my query.

  1. I want use detectron2 on my laptop locally without using Google colab. As normally as running anything locally on PC.
  2. I follow the instructions to setup the dependencies and requirements.
  3. I would like to "Run"/"Execute" detectron2 to make predictions on my locally stored images. I would like to "print /see" similar information of the bounding box for the set of objects detected and the corresponding class assignment ( as it can be seen from googlecolab tutorial)

Need support for the point 3.

ppwwyyxx commented 4 years ago

The code will run on a PC if you write the code in a python file on the PC and execute the python file.

vim5818 commented 4 years ago

@ppwwyyxx by default, You have a lot of executables like visualizer.py, box_regression.py in the project, but it is unclear which executable exactly gives the final BB output after detection. I would like to know if there is any file from which I can extract the same information as in colab. Maybe, I can workout from there.

ppwwyyxx commented 4 years ago

No files in the repository gives the coordinates of bounding boxes. The code in colab shows how to get the coordinates of bounding boxes.

kenny1323 commented 4 years ago

@deeplearner93 . Hi. This is just an example. Detectron2 has the file /detectron2/demo/predictor.py. The file /detectron2/demo/predictor.py is called by the file /detectron2/demo/demo.py We will invoke the file /detectron2/demo/demo.py to do the test. https://github.com/facebookresearch/detectron2/tree/master/demo

PART1 STEP1. Open the file /detectron2/demo/predictor.py

STEP2 Edit the function run_on_image(self, image) in following way. The last instruction in the function run_on image is: return predictions, vis_output Add before the last instruction (the instruction return) the following instructions print

print(instances) print(instances.pred_boxes) print(instances.pred_boxes[0])

OUTPUT AND EXPLANATION I got these outputs.

A) OUTPUT OF print(instances) Instances(num_instances=4, image_height=360, image_width=640, fields=[pred_boxes, scores, pred_classes, pred_masks])

Explanation: this output says me there are 4 boxes detected.

B) OUTPUT OF print(instances.pred_boxes) Boxes(tensor([[289.3555, 17.8171, 451.1482, 347.6050], [382.5501, 14.9712, 635.7133, 231.8446], [467.1654, 66.3414, 611.7201, 226.0997], [ 22.4782, 3.7928, 428.1484, 254.6716]]))

Explanation: this output says me, the coordinates of the boxes detected. In particular, the first box (instances.pred_boxes[0]) has the top_left point with coordinates (x,y)=(289.3555, 17.8171), and the bottom_right point with coordinates (x,y)=(451.1482, 347.6050)

C) OUTPUT OF print(instances.pred_boxes[0]) Boxes(tensor([[289.3555, 17.8171, 451.1482, 347.6050]])) Explanation: with this command, I just print the coordinates of the first box (instances.pred_boxes[0])

PART2 SEE ALSO A) https://detectron2.readthedocs.io/tutorials/models.html#model-output-format B) https://github.com/facebookresearch/detectron2/issues/356

PART3 This is my code, basically I have added 3 instructions PRINT, before of the instruction RETURN, in the file https://github.com/facebookresearch/detectron2/blob/master/demo/predictor.py

START CODE

FILE /detectron2/demo/predictor.py

FUNCTION run_on_image(self, image)

 def run_on_image(self, image):
    vis_output = None
    predictions = self.predictor(image)
    # Convert image from OpenCV BGR format to Matplotlib RGB format.
    image = image[:, :, ::-1]
    visualizer = Visualizer(image, self.metadata, instance_mode=self.instance_mode)
    if "panoptic_seg" in predictions:
        panoptic_seg, segments_info = predictions["panoptic_seg"]
        vis_output = visualizer.draw_panoptic_seg_predictions(
            panoptic_seg.to(self.cpu_device), segments_info
        )
    else:
        if "sem_seg" in predictions:
            vis_output = visualizer.draw_sem_seg(
                predictions["sem_seg"].argmax(dim=0).to(self.cpu_device)
            )
        if "instances" in predictions:
            instances = predictions["instances"].to(self.cpu_device)
            vis_output = visualizer.draw_instance_predictions(predictions=instances)
    print(instances)
    print(instances.pred_boxes)
    print(instances.pred_boxes[0])
    return predictions, vis_output

END CODE

PART4 To test my code I run these commands in the bash shell. COMMAND1: cd /000myfiles/anacondadir1/detectron2/demo COMMAND2: python3 demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input my_image.jpg --opts MODEL.DEVICE cpu MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl &

vim5818 commented 4 years ago

@kenny1323

Wow!!. Thank you very much !! All "Bow" to your work.

Warday commented 4 years ago

Hi, I have a problem. In my case I want the box coordinates as individual values because i need to extract the detected image from the main image. I can get all the coordinates as below: Boxes(tensor([[2054.7739, 287.8489, 2595.0151, 728.5417]], device='cuda:0')) But I have not been able to save each element as an individual element (x1=2054.7739, y1=287.8489...) I need each element to crop the image and get only the detected element. I try to convert the box element to list (.tolist) but that didn't work. Eny help?

kenny1323 commented 4 years ago

@Warday. Hi. Here you can find my directory /detectron2/demo https://github.com/kenny1323/detectron2_ken

PART1 About the box extraction I have added 2 files. 1)cp demo.py extract_person_box.py; 2)cp predictor.py extract_person_box_core.py

I have edited extract_person_box.py and extract_person_box_core.py in the following way.

The file extract_person_box.py basically is the same of the file demo.py, there are only few differences. The file extract_person_box_core.py has a new block of code tagged START_BOXES_ECTRACTION Inside the file extract_person_box_core.py, in particular search the instruction crop.

You should read the file readme.txt too. https://github.com/kenny1323/detectron2_ken/blob/master/README.txt

BOX EXTRACTION EXANPLE

BASH COMMAND

F="/SUPERDIR1"/allfile/1.png; cd /000myfiles/anacondadir1/detectron2/demo python3 extract_person_box.py --config-file /000myfiles/anacondadir1/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input $F --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl & sleep 3

PART2 About the mask extraction I have added 2 files. 1)cp demo.py extract_mask.py; 2)cp predictor.py extract_mask_core.py

I have edited extract_mask.py and extract_mask_core.py in the following way.

The file extract_mask.py basically is the same of the file demo.py, there are only few differences. The file extract_mask_core.py has a new block of code tagged START_MASK_EXTRACTION. The image /detectron2/demo/000028.jpg._out1.png is an example of mask extraction. Basically, the alpha channel of any pixel of the mask is set to zero. url_image: https://github.com/kenny1323/detectron2_ken/blob/master/000028.jpg._out1.png

MASK EXTRACTION

BASH COMMAND

F="/SUPERDIR1"/allfile/1000.png; cd /000myfiles/anacondadir1/detectron2/demo python3 extract_mask_cumulative.py --config-file /000myfiles/anacondadir1/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input $F --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl & sleep 3

Post Scriptum. About the image 000028.jpg._out1.png, you should invert the transparency, namely: for any pixel with alpha channel 0, change it to alphachannel=255; and any pixel with alpha channel not 0, change it to alphachannel=0;

hszkf commented 4 years ago

Hi @Warday ,

Based on the @deeplearner93 image attached on this issue,

you can just like

output_pred_boxes = outputs["instances"].pred_boxes for i in output_pred_boxes.__iter__(): print(i.cpu().numpy())

you will get individual bounding boxes at ease.

Warday commented 4 years ago

Thanx kenny1323, reading source code from extract_mask_core.py I could extract each box. thanx elmonisch i will check what is faster. I did Box= outputs["instances"].pred_boxes a=Box.tensor.cpu() a=a.numpy() and then navigate in each box thanx for both answers

barakullah commented 3 years ago

Hey @kenny1323 ! I want to get the bounding boxes of person reidentification system. Can you help?

sushmasuresh28 commented 3 years ago

@kenny1323 , Thanks a lot for your explanation here.

I have been trying to understand what the print(outputs["instances"].pred_boxes) represents. I now know that they represent the coordinates of the boxes detected. But why are these coordinates in decimals (or float values)? Why are they not whole numbers?

Normally we would have coordinates starting at (0,0) in the top left corner of the image and the next pixel would be (0,1) in (x,y) format. But, as shown by @deeplearner93 here, we obtain values like (126.6035,244.8977). Why is this the case?

@kenny1323, @hszkf, and @ppwwyyxx - if you are aware, I request you to please help me get a better understanding of this.

sarahdorich commented 3 years ago

@sushmasuresh28 I'm having the same confusion. Were you able to get an answer to your question elsewhere? Using the values from pred_boxes does not allow me to crop out the objects which if they were truly coordinates, I should be able to use them for cropping detected objects.

kenny1323 commented 3 years ago

@kenny1323 , Thanks a lot for your explanation here.

I have been trying to understand what the print(outputs["instances"].pred_boxes) represents. I now know that they represent the coordinates of the boxes detected. But why are these coordinates in decimals (or float values)? Why are they not whole numbers?

Normally we would have coordinates starting at (0,0) in the top left corner of the image and the next pixel would be (0,1) in (x,y) format. But, as shown by @deeplearner93 here, we obtain values like (126.6035,244.8977). Why is this the case?

@kenny1323, @hszkf, and @ppwwyyxx - if you are aware, I request you to please help me get a better understanding of this.

@sushmasuresh28 i think Detectron internally works in this way. It use several algorithms and models to do several estimations (predictions) of several areas. The number 126.6035 basically is the average result.

For example, assume Detecron does 3 estimations, 127, 126, 127. The average value is 126.66667.

@sarahdorich

To use the number 126.6035 to crop the image, probably you should convert it to integer.

x=round(126.6035)

now x is 127

To crop the image, I use PIL. https://stackoverflow.com/questions/9983263/how-to-crop-an-image-using-pil