agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
MIT License
103 stars 24 forks source link

Bounding Box - what are (x, y) ? #75

Closed agentmorris closed 1 year ago

agentmorris commented 1 year ago

Need clarifications on the conventions used for bounding box parameters in JSON [x, y, w, h].

I am currently using the dataset that was downloaded from the below link. https://lilablobssc.blob.core.windows.net/caltechcameratraps/eccv_18_all_images_sm.tar.gz

Assumed x, y as centers of the bounding box. However, when I try to normalize, I get values >1 or sometimes <0 also.

Please find the attached spreadsheet for reference. train_normalized.xlsx

Here's how I went about calculating. Taking the example for image_id = '586fde22-23d2-11e8-a6a3-ec086b02610b'

Bounding box values as given in the json. x_center = 1254.455625 y_center = 591.602857142857 w = 175.361428571429 h = 156.157053571429

Would want to normalize these values and bring them within [0-1].

-# Image width and height values -# Found these values using imagesize -# imW, imH = imagesize.get(src_path) imW = 1024 imH = 747

-# top-left coordinate of the box xmin = x_center - w/2 = 1166.77491071429 xmin_normal = xmin/imW = 1.13942862374442 # > 1

I am guessing (x,y) to be one of the corners of the bounding box. Can you please clarify.


Issue cloned from Microsoft/CameraTraps, original issue posted by ra9hur on Oct 13, 2022.

agentmorris commented 1 year ago

All of the .json files for camera trap datasets on LILA are in COCO Camera Traps format. Bounding boxes are [x,y,w,h], where:

Let us know if you find anything that isn't consistent with that.

Hope that helps!


(Comment originally posted by agentmorris)