Dataset Converters is a conversion toolset between different object detection and instance segmentation annotation formats.
It was written and is maintained by deep learning developers from Intelligent Security Systems company to simplify the research.
There are multiple different dataset annotation formats for object detection and instance segmentation.
This repository contains a system of scripts, which simplify the conversion process between those formats. We rely on COCO format as the main representation.
Please, cd
to DatasetConverters
folder and then type
pip install -r requirements.txt
This will install the required dependencies.
To perform conversion between different dataset formats, we provide the script called convert.py
.
For example, suppose, you have ADE20K dataset and you want to convert it into COCO format.
For that purpose, please type
python convert.py --input-folder <path_to_folder_ADE20K_2016_07_26> --output-folder <output_path> \
--input-format ADE20K --output-format COCO --copy
Note. The shorter version of the same can be written as
python convert.py -i <path_to_folder_ADE20K_2016_07_26> -o <output_path> -I ADE20K -O COCO --copy
Note. --copy argument stands for copying images. In Linux you can instead pass --symlink to create symbolic links.
You are ready to use ADE20K in frameworks with COCO input format.
For the full list of supported conversions, please refer to Supported conversions section.
If you have multiple annotations, converted to COCO format, we provide script merge_json_datasets.py
to merge them.
Suppose, you have COCO and Pascal VOC segmentations in COCO format and want to merge dog and horse annotations from them.
This is how merge_json_datasets.py
can serve that purpose
python merge_json_datasets.py -d <coco_images_folder> -a <coco_annotations.json> --ids 18 19 \
-d <vocsegm_images_folder> -a <vocsegm_annotations.json> --ids 12 13 \
--output-ids 1 2 -o <output_dir> -n dog horse
In this example, number of merged datasets is two, but it is not limited. You can merge as many datasets and classes in COCO format, as you need.
For each dataset in COCO format, one should provide the following arguments
After all datasets are specified with this pattern, output information is specified with the following arguments
In this section we list all of the supported formats and their conversions.
We welcome community contributions to the Dataset Converters.
If you want to add a new dataset converter, please note, that we expect
The list of the core files, which are the key to understand the implementation process is the following
Converter.py
ConverterBase.py
converters.py
formats.py
The new converter is a subclass of ConverterBase
class with _run
mehtod
overloaded and conversion format added to the list formats
.