HumanSignal / label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
https://labelstud.io/
262 stars 130 forks source link

CONLL Conversion drops NER labels from token classification #254

Open MattHag opened 1 year ago

MattHag commented 1 year ago

There is a bug in the conversion from .json to .conll, where many labels for token classifications are lost in translation. I discovered it when exporting NER labeled data as .conll from Label Studio.

Example to replicate

The same error happens when converting the exported JSON with the converter using my example_data.zip.

label-studio-converter export -i export_json.json -c label_studio_config.xml -o output_dir -f CONLL2003

The provided JSON export contains 3 "-hdmi" and 3 "-displayport" labels. The converted CONLL contains no "-displayport" label anymore.

Versions

label-studio: 1.8.2.post1 label-studio-converter: 0.0.57 macOS: 14.0