Deci-AI / data-gradients

Computer Vision dataset analysis
Apache License 2.0
293 stars 33 forks source link
dependency-graph

DataGradients

DataGradients is an open-source python based library designed for computer vision dataset analysis.

Extract valuable insights from your datasets and get comprehensive reports effortlessly.

πŸ” Detect Common Data Issues

πŸ’‘ Extract Insights for Better Model Design

🎯 Reduce Guesswork for Hyperparameters

πŸ›  Capabilities

Non-exhaustive list of supported features.

πŸ“˜ Deep Dive into Data Profiling
Puzzled by some dataset challenges while using DataGradients? We've got you covered.
Enrich your understanding with this πŸŽ“free online course. Dive into dataset profiling, confront its complexities, and harness the full potential of DataGradients.

Example of pages from the Report


Example of specific features

Check out the pre-computed dataset analysis for a deeper dive into reports.

Table of Contents

Installation

You can install DataGradients directly from the GitHub repository.

pip install data-gradients

Quick Start

Prerequisites

Please ensure all the points above are checked before you proceed with DataGradients.

Example

from torchvision.datasets import CocoDetection

train_data = CocoDetection(...)
val_data = CocoDetection(...)
class_names = ["person", "bicycle", "car", "motorcycle", ...]
# OR
# class_names = {0: "person", 1:"bicycle", 2:"car", 3: "motorcycle", ...}

Good to Know - DataGradients will try to find out how the dataset returns images and labels.

  • If something cannot be automatically determined, you will be asked to provide some extra information through a text input.
  • In some extreme cases, the process will crash and invite you to implement a custom dataset extractor

Heads up - DataGradients provides a few out-of-the-box dataset/dataloader implementation. You can find more dataset implementations in PyTorch or SuperGradients.

Dataset Analysis

You are now ready to go, chose the relevant analyzer for your task and run it over your datasets!

Image Classification

from data_gradients.managers.classification_manager import ClassificationAnalysisManager 

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = ClassificationAnalysisManager(
    report_title="Testing Data-Gradients Classification",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Object Detection

from data_gradients.managers.detection_manager import DetectionAnalysisManager

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = DetectionAnalysisManager(
    report_title="Testing Data-Gradients Object Detection",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Semantic Segmentation

from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager 

train_data = ...  # Your dataset iterable (torch dataset/dataloader/...)
val_data = ...    # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]

analyzer = SegmentationAnalysisManager(
    report_title="Testing Data-Gradients Segmentation",
    train_data=train_data,
    val_data=val_data,
    class_names=class_names,
)

analyzer.run()

Example

You can test the segmentation analysis tool in the following example which does not require you to download any additional data.

Report

Once the analysis is done, the path to your pdf report will be printed. You can find here examples of pre-computed dataset analysis reports.

Feature Configuration

The feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features. If you are interested in customizing this configuration, you can check out the documentation on that topic.

Dataset Extractors

Ensuring Comprehensive Dataset Compatibility

DataGradients is adept at automatic dataset inference; however, certain specificities, such as nested annotations structures or unique annotation format, may necessitate a tailored approach.

To address this, DataGradients offers extractors tailored for enhancing compatibility with diverse dataset formats.

For an in-depth understanding and implementation details, we encourage a thorough review of the Dataset Extractors Documentation.

Pre-computed Dataset Analysis

Example notebook on Colab

Detection

Common Datasets - [COCO](https://dgreports.deci.ai/detection/COCO/Report.pdf) - [VOC](https://dgreports.deci.ai/detection/VOC/Report.pdf) [Roboflow 100](https://universe.roboflow.com/roboflow-100?ref=blog.roboflow.com) Datasets - [4-fold-defect](https://dgreports.deci.ai/detection/RF100_4-fold-defect/Report.pdf) - [abdomen-mri](https://dgreports.deci.ai/detection/RF100_abdomen-mri/Report.pdf) - [acl-x-ray](https://dgreports.deci.ai/detection/RF100_acl-x-ray/Report.pdf) - [activity-diagrams-qdobr](https://dgreports.deci.ai/detection/RF100_activity-diagrams-qdobr/Report.pdf) - [aerial-cows](https://dgreports.deci.ai/detection/RF100_aerial-cows/Report.pdf) - [aerial-pool](https://dgreports.deci.ai/detection/RF100_aerial-pool/Report.pdf) - [aerial-spheres](https://dgreports.deci.ai/detection/RF100_aerial-spheres/Report.pdf) - [animals-ij5d2](https://dgreports.deci.ai/detection/RF100_animals-ij5d2/Report.pdf) - [apex-videogame](https://dgreports.deci.ai/detection/RF100_apex-videogame/Report.pdf) - [apples-fvpl5](https://dgreports.deci.ai/detection/RF100_apples-fvpl5/Report.pdf) - [aquarium-qlnqy](https://dgreports.deci.ai/detection/RF100_aquarium-qlnqy/Report.pdf) - [asbestos](https://dgreports.deci.ai/detection/RF100_asbestos/Report.pdf) - [avatar-recognition-nuexe](https://dgreports.deci.ai/detection/RF100_avatar-recognition-nuexe/Report.pdf) - [axial-mri](https://dgreports.deci.ai/detection/RF100_axial-mri/Report.pdf) - [bacteria-ptywi](https://dgreports.deci.ai/detection/RF100_bacteria-ptywi/Report.pdf) - [bccd-ouzjz](https://dgreports.deci.ai/detection/RF100_bccd-ouzjz/Report.pdf) - [bees-jt5in](https://dgreports.deci.ai/detection/RF100_bees-jt5in/Report.pdf) - [bone-fracture-7fylg](https://dgreports.deci.ai/detection/RF100_bone-fracture-7fylg/Report.pdf) - [brain-tumor-m2pbp](https://dgreports.deci.ai/detection/RF100_brain-tumor-m2pbp/Report.pdf) - [cable-damage](https://dgreports.deci.ai/detection/RF100_cable-damage/Report.pdf) - [cables-nl42k](https://dgreports.deci.ai/detection/RF100_cables-nl42k/Report.pdf) - [cavity-rs0uf](https://dgreports.deci.ai/detection/RF100_cavity-rs0uf/Report.pdf) - [cell-towers](https://dgreports.deci.ai/detection/RF100_cell-towers/Report.pdf) - [cells-uyemf](https://dgreports.deci.ai/detection/RF100_cells-uyemf/Report.pdf) - [chess-pieces-mjzgj](https://dgreports.deci.ai/detection/RF100_chess-pieces-mjzgj/Report.pdf) - [circuit-elements](https://dgreports.deci.ai/detection/RF100_circuit-elements/Report.pdf) - [circuit-voltages](https://dgreports.deci.ai/detection/RF100_circuit-voltages/Report.pdf) - [cloud-types](https://dgreports.deci.ai/detection/RF100_cloud-types/Report.pdf) - [coins-1apki](https://dgreports.deci.ai/detection/RF100_coins-1apki/Report.pdf) - [construction-safety-gsnvb](https://dgreports.deci.ai/detection/RF100_construction-safety-gsnvb/Report.pdf) - [coral-lwptl](https://dgreports.deci.ai/detection/RF100_coral-lwptl/Report.pdf) - [corrosion-bi3q3](https://dgreports.deci.ai/detection/RF100_corrosion-bi3q3/Report.pdf) - [cotton-20xz5](https://dgreports.deci.ai/detection/RF100_cotton-20xz5/Report.pdf) - [cotton-plant-disease](https://dgreports.deci.ai/detection/RF100_cotton-plant-disease/Report.pdf) - [csgo-videogame](https://dgreports.deci.ai/detection/RF100_csgo-videogame/Report.pdf) - [currency-v4f8j](https://dgreports.deci.ai/detection/RF100_currency-v4f8j/Report.pdf) - [digits-t2eg6](https://dgreports.deci.ai/detection/RF100_digits-t2eg6/Report.pdf) - [document-parts](https://dgreports.deci.ai/detection/RF100_document-parts/Report.pdf) - [excavators-czvg9](https://dgreports.deci.ai/detection/RF100_excavators-czvg9/Report.pdf) - [farcry6-videogame](https://dgreports.deci.ai/detection/RF100_farcry6-videogame/Report.pdf) - [fish-market-ggjso](https://dgreports.deci.ai/detection/RF100_fish-market-ggjso/Report.pdf) - [flir-camera-objects](https://dgreports.deci.ai/detection/RF100_flir-camera-objects/Report.pdf) - [furniture-ngpea](https://dgreports.deci.ai/detection/RF100_furniture-ngpea/Report.pdf) - [gauge-u2lwv](https://dgreports.deci.ai/detection/RF100_gauge-u2lwv/Report.pdf) - [grass-weeds](https://dgreports.deci.ai/detection/RF100_grass-weeds/Report.pdf) - [gynecology-mri](https://dgreports.deci.ai/detection/RF100_gynecology-mri/Report.pdf) - [halo-infinite-angel-videogame](https://dgreports.deci.ai/detection/RF100_halo-infinite-angel-videogame/Report.pdf) - [hand-gestures-jps7z](https://dgreports.deci.ai/detection/RF100_hand-gestures-jps7z/Report.pdf) - [insects-mytwu](https://dgreports.deci.ai/detection/RF100_insects-mytwu/Report.pdf) - [leaf-disease-nsdsr](https://dgreports.deci.ai/detection/RF100_leaf-disease-nsdsr/Report.pdf) - [lettuce-pallets](https://dgreports.deci.ai/detection/RF100_lettuce-pallets/Report.pdf) - [liver-disease](https://dgreports.deci.ai/detection/RF100_liver-disease/Report.pdf) - [marbles](https://dgreports.deci.ai/detection/RF100_marbles/Report.pdf) - [mask-wearing-608pr](https://dgreports.deci.ai/detection/RF100_mask-wearing-608pr/Report.pdf) - [mitosis-gjs3g](https://dgreports.deci.ai/detection/RF100_mitosis-gjs3g/Report.pdf) - [number-ops](https://dgreports.deci.ai/detection/RF100_number-ops/Report.pdf) - [paper-parts](https://dgreports.deci.ai/detection/RF100_paper-parts/Report.pdf) - [paragraphs-co84b](https://dgreports.deci.ai/detection/RF100_paragraphs-co84b/Report.pdf) - [parasites-1s07h](https://dgreports.deci.ai/detection/RF100_parasites-1s07h/Report.pdf) - [peanuts-sd4kf](https://dgreports.deci.ai/detection/RF100_peanuts-sd4kf/Report.pdf) - [peixos-fish](https://dgreports.deci.ai/detection/RF100_peixos-fish/Report.pdf) - [people-in-paintings](https://dgreports.deci.ai/detection/RF100_people-in-paintings/Report.pdf) - [pests-2xlvx](https://dgreports.deci.ai/detection/RF100_pests-2xlvx/Report.pdf) - [phages](https://dgreports.deci.ai/detection/RF100_phages/Report.pdf) - [pills-sxdht](https://dgreports.deci.ai/detection/RF100_pills-sxdht/Report.pdf) - [poker-cards-cxcvz](https://dgreports.deci.ai/detection/RF100_poker-cards-cxcvz/Report.pdf) - [printed-circuit-board](https://dgreports.deci.ai/detection/RF100_printed-circuit-board/Report.pdf) - [radio-signal](https://dgreports.deci.ai/detection/RF100_radio-signal/Report.pdf) - [road-signs-6ih4y](https://dgreports.deci.ai/detection/RF100_road-signs-6ih4y/Report.pdf) - [road-traffic](https://dgreports.deci.ai/detection/RF100_road-traffic/Report.pdf) - [robomasters-285km](https://dgreports.deci.ai/detection/RF100_robomasters-285km/Report.pdf) - [secondary-chains](https://dgreports.deci.ai/detection/RF100_secondary-chains/Report.pdf) - [sedimentary-features-9eosf](https://dgreports.deci.ai/detection/RF100_sedimentary-features-9eosf/Report.pdf) - [shark-teeth-5atku](https://dgreports.deci.ai/detection/RF100_shark-teeth-5atku/Report.pdf) - [sign-language-sokdr](https://dgreports.deci.ai/detection/RF100_sign-language-sokdr/Report.pdf) - [signatures-xc8up](https://dgreports.deci.ai/detection/RF100_signatures-xc8up/Report.pdf) - [smoke-uvylj](https://dgreports.deci.ai/detection/RF100_smoke-uvylj/Report.pdf) - [soccer-players-5fuqs](https://dgreports.deci.ai/detection/RF100_soccer-players-5fuqs/Report.pdf) - [soda-bottles](https://dgreports.deci.ai/detection/RF100_soda-bottles/Report.pdf) - [solar-panels-taxvb](https://dgreports.deci.ai/detection/RF100_solar-panels-taxvb/Report.pdf) - [stomata-cells](https://dgreports.deci.ai/detection/RF100_stomata-cells/Report.pdf) - [street-work](https://dgreports.deci.ai/detection/RF100_street-work/Report.pdf) - [tabular-data-wf9uh](https://dgreports.deci.ai/detection/RF100_tabular-data-wf9uh/Report.pdf) - [team-fight-tactics](https://dgreports.deci.ai/detection/RF100_team-fight-tactics/Report.pdf) - [thermal-cheetah-my4dp](https://dgreports.deci.ai/detection/RF100_thermal-cheetah-my4dp/Report.pdf) - [thermal-dogs-and-people-x6ejw](https://dgreports.deci.ai/detection/RF100_thermal-dogs-and-people-x6ejw/Report.pdf) - [trail-camera](https://dgreports.deci.ai/detection/RF100_trail-camera/Report.pdf) - [truck-movement](https://dgreports.deci.ai/detection/RF100_truck-movement/Report.pdf) - [tweeter-posts](https://dgreports.deci.ai/detection/RF100_tweeter-posts/Report.pdf) - [tweeter-profile](https://dgreports.deci.ai/detection/RF100_tweeter-profile/Report.pdf) - [underwater-objects-5v7p8](https://dgreports.deci.ai/detection/RF100_underwater-objects-5v7p8/Report.pdf) - [underwater-pipes-4ng4t](https://dgreports.deci.ai/detection/RF100_underwater-pipes-4ng4t/Report.pdf) - [uno-deck](https://dgreports.deci.ai/detection/RF100_uno-deck/Report.pdf) - [valentines-chocolate](https://dgreports.deci.ai/detection/RF100_valentines-chocolate/Report.pdf) - [vehicles-q0x2v](https://dgreports.deci.ai/detection/RF100_vehicles-q0x2v/Report.pdf) - [wall-damage](https://dgreports.deci.ai/detection/RF100_wall-damage/Report.pdf) - [washroom-rf1fa](https://dgreports.deci.ai/detection/RF100_washroom-rf1fa/Report.pdf) - [weed-crop-aerial](https://dgreports.deci.ai/detection/RF100_weed-crop-aerial/Report.pdf) - [wine-labels](https://dgreports.deci.ai/detection/RF100_wine-labels/Report.pdf) - [x-ray-rheumatology](https://dgreports.deci.ai/detection/RF100_x-ray-rheumatology/Report.pdf)

Segmentation

- [COCO](https://dgreports.deci.ai/segmentation/COCO/Report.pdf) - [Cityspace](https://dgreports.deci.ai/segmentation/Cityspace/Report.pdf) - [VOC](https://dgreports.deci.ai/segmentation/VOC/Report.pdf)

Community

Click here to join our Discord Community

License

This project is released under the Apache 2.0 license.