Unity-Technologies / datasetinsights

Synthetic Dataset Insights
Apache License 2.0
84 stars 16 forks source link
computer-vision machine-learning pytorch synthetic-datasets unity-simulation

Dataset Insights

PyPI python PyPI version Downloads Tests License

Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception package.

Installation

Datasetinsights is published to PyPI. You can simply run pip install datasetinsights command under a supported python environments:

Getting Started

Dataset Statistics

We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.

Load Datasets

The Unity Perception package provides datasets under this schema. The datasetinsighs package also provide convenient python modules to parse datasets.

For example, you can load AnnotationDefinitions into a python dictionary by providing the corresponding annotation definition ID:

from datasetinsights.datasets.unity_perception import AnnotationDefinitions

annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")

Similarly, for MetricDefinitions:

from datasetinsights.datasets.unity_perception import MetricDefinitions

metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")

The Captures table provide the collection of simulation captures and annotations. You can load these records directly as a Pandas DataFrame:

from datasetinsights.datasets.unity_perception import Captures

captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")

The Metrics table can store simulation metrics for a capture or annotation. You can also load these records as a Pandas DataFrame:

from datasetinsights.datasets.unity_perception import Metrics

metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")

Download Datasets

You can download the datasets using the download command:

datasetinsights download --source-uri=<xxx> --output=$HOME/data

The download command supports HTTP(s), and GCS.

Alternatively, you can download dataset directly from python interface.

GCSDatasetDownloader can download a dataset from GCS locations.

from datasetinsights.io.downloader import GCSDatasetDownloader

source_uri=gs://url/to/file.zip # or gs://url/to/folder
dest = "~/data"
downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)

HTTPDatasetDownloader can a dataset from any HTTP(S) url.

from datasetinsights.io.downloader import HTTPDatasetDownloader

source_uri=http://url.to.file.zip
dest = "~/data"
downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)

Convert Datasets

If you are interested in converting the synthetic dataset to COCO format for annotations that COCO supports, you can run the convert command:

datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Instances

or

datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Keypoints

You will need to provide 2D bounding box definition ID in the synthetic dataset. We currently only support 2D bounding box and human keypoint annotations for COCO format.

Docker

You can use the pre-build docker image unitytechnologies/datasetinsights to interact with datasets.

Documentation

You can find the API documentation on readthedocs.

Contributing

Please let us know if you encounter a bug by filing an issue. To learn more about making a contribution to Dataset Insights, please see our Contribution page.

License

Dataset Insights is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Citation

If you find this package useful, consider citing it using:

@misc{datasetinsights2020,
    title={Unity {D}ataset {I}nsights Package},
    author={{Unity Technologies}},
    howpublished={\url{https://github.com/Unity-Technologies/datasetinsights}},
    year={2020}
}