HumanSignal / label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
https://labelstud.io/
255 stars 132 forks source link

CSV output should use simple field values for labels (not convoluted JSON) #59

Open tomasohara opened 2 years ago

tomasohara commented 2 years ago

Unfortunately, the CSV output exported by Label Studio uses JSON for the label . See dog-example-project-35-at-2021-10-07-18-48-22cb3c67.csv. This makes it hard to review the data in spreadsheets,.

Instead, the label should be extracted as a simple string value, as with the other converters (e.g., CONLL). In addition, each annotation should be on a separate line. For example, 15 distinct annotations are packed into a single line in the above example!

For the expected output see the attached desired-dog-example-project-35-at-2021-10-07-18-49-22cb3c67.csv.

Note that this is not a feature request: I was baffled when I found out about this behavior. For example, why bother having a CSV format if the important part must be processed with a JSON utility?!

makseq commented 2 years ago

@tomasohara Originally we implemented a CSV export for Choices, not for NER labels. The CSV with labels is produced automatically without any preprocessing (despite to Choices). Yes, maybe it's better to disable export altogether for everything that is not Choices. Or we should make a preprocessing for labels too.

tomasohara commented 2 years ago

OK, thanks for the clarification. The changes are minimal, as shown in the following comparison of the existing convert_to_csv vs. my convert_to_flat_csv: _convert_to_csv_flat-diff-8oct21.

Here's the original and revised functions: _convert_to_csv.txt and _convert_to_csv_flat.txt.

Should I make a push request? I would implement both in the same function with the new behavior governed by an environment variable (e.g., FLATTENED_CSV_ANNOTATIONS).

tomasohara commented 2 years ago

Sorry, I closed it by accident when adding the diff listing. Therefore, I re-opened it.

makseq commented 2 years ago

Yep, pull request would be great!