UniversalDataTool / udt-format

A simple universal data description format for datasets, tailored for interfacing with humans.
https://universaldatatool.com
MIT License
20 stars 14 forks source link

BREAKING: Switch to "Samples-style" #4

Closed seveibar closed 4 years ago

seveibar commented 4 years ago

This will hopefully be the only big breaking change we need to do. When I originally came up with this format I thought it would be a good idea to separate taskData and taskOutput to make it clear what work was performed and what the input data was. In practice, I don't see a need for this distinction and it makes indexing a pain. This change makes the JSON more consistent with the CSV representation and the representation of other libraries by removing taskData and taskOutput and creating a samples array which contains both input data and output data.

"output" is not quite a perfect name, but neither is "label". "annotations" is slightly more fitting but confusing in the case where image classifications have string as an output. So in short, I think "output" is general enough to cover all the cases.

// OLD WAY
{
  "taskData": [
     { "imageUrl": "https://..." },
     { "imageUrl": "https://..." }
  ],
  "taskOutput": [
     "cat",
     "dog"
  ]
}
// NEW WAY
{
  "samples": [
     {
        "imageUrl": "https://...",
        "output": "cat"
     },
     {
        "imageUrl": "https://...",
        "output": "dog"
     }
  ]
}