agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
MIT License
101 stars 23 forks source link

Datasets only with bounding boxes for download #73

Closed agentmorris closed 1 year ago

agentmorris commented 1 year ago

At the outset, thanks for sharing this huge dataset for download.

Through CameraTraps/lila/get_lila_category_counts.py, could identify all available datasets, classes and corresponding counts.

I am primarily interested in object detection (detecting animals), looking for datasets with bounding boxes. From the output, could identify the below datasets with bounding boxes.

  1. Caltech Camera Traps_bbox
  2. NACTI_bbox
  3. WCS Camera Traps_bbox
  4. Snapshot Serengeti_bbox
  5. SWG Camera Traps_bbox

I was hoping to download only these datasets using CameraTraps/lila/download_lila_subset.py. I made the below changes to the script.

datasets_of_interest = ['Missouri Camera Traps','ENA24','Caltech Camera Traps','SWG Camera Traps']

datasets_of_interest = ['Caltech Camera Traps_bbox','NACTI_bbox']

species_of_interest = ['red_fox','fox','grey fox','red fox','leopard_cat']

species_of_interest = ['opossum','squirrel','rabbit','rodent','raccoon','deer','coyote','bobcat','cat','bird','skunk','dog','fox','lizard','person']

Unfortunately, the script could not identify ds_name: Caltech Camera Traps_bbox in metadata_table. Attaching a txt file to include contents of metadata_table. Assume, dictionary only includes URL of classification datasets and not object detection datasets. dataset search.txt

Need your advise on how to go about extracting only object detection related datasets. Are there any specific scripts that I should be using ?

Instead of script changes, Can you please share their URLs and corresponding JSONs. That would be a great help.

Thanks again !!


Issue cloned from Microsoft/CameraTraps, original issue posted by ra9hur on Oct 05, 2022.

agentmorris commented 1 year ago

There's not exactly a list of datasets that contain bounding boxes, but lila_common.py refers to this CSV file, which lists the image and metadata URLs for all camera trap datasets on LILA. It has the following columns:

name,image_base_url,metadata_url,bbox_url

The "bbox_url" column is populated only when there is a separate .json file for bounding boxes. E.g., ENA24 also contains bounding boxes, but there is not a separate .json file for bounding boxes. Same with Island Conservation Camera Traps and Channel Islands Camera Traps.

So the "right" way to look for bounding boxes is to download and read all the .json files listed in that file, enumerate the annotations, and see which annotations have 'bbox' fields. But depending on how robust a solution you want, it may be just as fast to open all the dataset home pages from the list of camera trap datasets and just manually see which pages say "bounding boxes", and copy the URLs.

FYI the above file only refers to camera trap data; depending on your application, you may also be interested in the following non-camera-trap datasets that include bounding boxes:

Hope that helps!

-Dan


(Comment originally posted by agentmorris)

agentmorris commented 1 year ago

Thanks Dan for clarifying !! Will initially start with the CSV that you mentioned about.


(Comment originally posted by ra9hur)