cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.22k stars 2.95k forks source link

Combine Export or "Dump" mulitple Jobs into one .zip with train/test/val splits #791

Closed Sparrowtech closed 3 years ago

Sparrowtech commented 4 years ago

WORKFLOW WORKAROUNDS: We've created individual "Jobs" to represent different classes of objects; i.e. "car, truck, van, helicopter, airplane, etc." largely due to CVAT difficulties-ability to load very large datasets. Each CVAT Job represents ~2500 images and tends to be collectively around 1GB in size between the images and annotations. Currently there are ~ 60 different jobs or classes of objects, 60 GB and ~ 150,000 images.

Routinely we create specific datasets (10-20 object classes or Jobs") which require a lot of post-exporting heavy lifting having to merge tfrecords or xml files into one or batches, not to mention splitting of train/test/val sets. I know that there are a lot of tools out there to help with pre-process and we currently employ many.

Would be ideal to have functionality to choose " Car, Airplane, Helicopter, Bus, ... etc" from the dashboard to EXPORT INTO ONE TASK... AND ability to choose ratio of images to be split into train/test/val sets. e.g. 70% train, 15% test, 15% val. resulting in .zip file(s) with images-annotations or tfrecords created. No extra processing for randomizing, just extract split % from each job and combined for e.g. "Train" insuring well balanced classes rather than relying on function later unknown which is just a random exercise.

Thanks!

nmanovic commented 4 years ago

@Sparrowtech , we are going to introduce projects where you can join all similar tasks and after that export images + annotations for the whole project. Will it work for you?

image

Sparrowtech commented 4 years ago

That would be great! Look forward to the update and also would be great to have ability to export the images/annotations not only to one file but also option to export as Train, Test, & Validation sets. Thanks!

Sparrowtech commented 4 years ago

@zhiltsov-max, I see that there has been some activity and a "Release" that has been made on this request. Not super familiar with how the Releases are made available from GitHub whether production or for Beta. Really simply question... Is this something I can have access to today or is it being embedded into another release down the road? Please advise if you don't mind. Thanks!

zhiltsov-max commented 4 years ago

@nmanovic, please, answer here.

nmanovic commented 4 years ago

@Sparrowtech , the feature will be available in Release 1.0.0 (~ end of February next year). During a week or two first prototype will be merged into develop branch. If you can test the implementation and confirm that it is something useful for you. We don't recommend to use develop in production but internally we use it for our own tasks.

Does it answer on your question?

Sparrowtech commented 4 years ago

yes, thank you and will look for feature in the development branch over the next few weeks.

nmanovic commented 4 years ago

Let's keep the issue till it is resolved.

zhiltsov-max commented 4 years ago

Currently, it is possible with Datumaro:

  1. Export all desired tasks in Datumaro format, unpack
  2. Check readme in the downloaded files
  3. datum project create -o proj
    datum source add path -p proj -f datumaro_project <path_to_the_unpacked_archive1>
    datum source add path -p proj -f datumaro_project <path_to_the_unpacked_archive2>
    ...
    datum project transform -p proj -t random_split [-- -s subset1:ratio1 etc.]
    datum project export -f tf_detection_api -p <path_to_transform_result> -- --save-images
zhiltsov-max commented 4 years ago

Keeping open as a request for:

shaojun commented 3 years ago

when can we expect to have the project export feature in UI? I see this similar request for years, thanks.

zhiltsov-max commented 3 years ago

Done in #3365