Why you need this feature:
It would be helpful for us and customer projects to obtain some additional dataset statistics and analysis from what is currently offered on datasetinsights. It can help with comparing between different dataset versions and also between synthetic and real datasets that are comprised of similar sets of annotations/object classes/target tasks.
Describe the solution you'd like:
We propose to include the following dataset statistics analysis:
[ ] bounding box heatmaps per object class
[ ] object counts over the entire dataset and per image
[ ] size distribution of each object class's bounding box, both raw pixel count and normalized sizes w.r.t. image size
[ ] size distribution of each object class's segmentation, both raw pixel count and normalized sizes w.r.t. image size and w.r.t. bounding box size
[ ] semantic/instance segmentation heatmaps per object class
[ ] colour-level descriptors for datasets (histogram analysis, etc.)
[ ] colour-level analysis for each object instance (within each bbox)
[ ] colour-level analysis of each object instance for object mask region and outside the mask (the region inside object instance's bbox that is not part of the object mask, i.e., the background pixels)
Anything else you would like to add:
It might be helpful to separate the human-specific stats analysis from multi-object non-human stats, as the human-centric data may include keypoint annotations that do not appear for other types of datasets (e.g., COCO keypoints vs. COCO instances).
Stand-alone jupyter notebooks inside datasetinsights could also be beneficial to provide more granularity of analysis for different tasks.
Why you need this feature: It would be helpful for us and customer projects to obtain some additional dataset statistics and analysis from what is currently offered on datasetinsights. It can help with comparing between different dataset versions and also between synthetic and real datasets that are comprised of similar sets of annotations/object classes/target tasks.
Describe the solution you'd like: We propose to include the following dataset statistics analysis:
Anything else you would like to add: