dbolya / tide

A General Toolbox for Identifying Object Detection Errors
https://dbolya.github.io/tide
MIT License
704 stars 115 forks source link

OpenImages support would be great #4

Open rodrigob opened 4 years ago

rodrigob commented 4 years ago

Would be great to also support the OpenImages dataset. (15M boxes over 600 categories; 2.7M instance segmentations over 350 categories)

This dataset was part of the RVC 2020 challenge and its own Kaggle competitions in 2019.

dbolya commented 4 years ago

Thanks for the suggestion!

I don't have much experience with OpenImages, so would it be possible for you to implement a dataset driver for it as a pull request? You can use the COCO one as reference: https://github.com/dbolya/tide/blob/49a5d2a4aeb56795e93a3ed7cc7e6d25757bb4c1/tidecv/datasets.py#L60-L107

garyfanhku commented 3 years ago

The official toolkit from the RVC competition might be helpful: https://github.com/ozendelait/rvc_devkit/tree/master/objdet

Specifically, it downloads the original CSV annotations from OID (V5|V6) and resorts to https://github.com/bethgelab/openimages2coco to convert them into COCO instance/segmentation format, which can then be a drop-in replacement for TIDE.

The steps to produce a working solutions seem to be as follow:

  1. remove unnecessary lines used to download OID images https://github.com/ozendelait/rvc_devkit/blob/c986717abc24eba99a259e203a9ce4e182b2124e/objdet/download_oid_boxable.sh#L21 and run download_oid_boxable.sh
  2. Modify the line https://github.com/ozendelait/rvc_devkit/blob/c986717abc24eba99a259e203a9ce4e182b2124e/objdet/convert_oid_coco.sh#L32 to conform to the change in https://github.com/bethgelab/openimages2coco, where convert.py has been renamed to convert_annotations.py and run the conversion script
  3. For bounding box annotations, it seems a dummy 'segmentation' field is needed, or one can fork TIDE and make necessary adjustments. For mask annotations, somehow openimages2coco decides to use 'segments_info' as the field name https://github.com/bethgelab/openimages2coco/blob/8991d9bccbd3d91f32b87f04dab60b2a61cb608e/utils.py#L238 , so that needs to be converted as well.

Then simply substitute the OID path:

tide = TIDE()
tide.evaluate(
    datasets.COCO(path_to_oid_converted), 
    datasets.COCOResult(path_to_preds), 
    mode=TIDE.BOX
    ) # Use TIDE.MASK for masks
tide.summarize()  # Summarize the results as tables in the console
tide.plot()       # Show a summary figure. Specify a folder and it'll output a png to that folder.