HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
17.44k stars 2.16k forks source link

Is there any template for annotating dataset for Coreference Resolution? #3493

Open Tanmay98 opened 1 year ago

Tanmay98 commented 1 year ago

Hi, I tried annotating text in NER template and relation extraction template, my only problem is when I export data into json format I have the NER annotations as well as the relations but when I try exporting in the CoNLL format I lose the relations.

Is there any way I can get annotations for training a coref model? Thanks in advance.

makseq commented 1 year ago

Hi, do you have an example how CoNLL export should look like in this case?

Tanmay98 commented 1 year ago

Hi, sure. This is the link to a CoNLL format https://github.com/dbamman/litbank/blob/master/coref/conll/1023_bleak_house_brat.conll And this is the link to its raw text file https://github.com/dbamman/litbank/blob/master/coref/tsv/1023_bleak_house_brat.txt

Tanmay98 commented 1 year ago

Also, I wrote a script to convert and it produced the CoNLL format file from the json input but spacy is giving me error on format still, although according to me it should have worked. Let me know if you want to look at it !

Tanmay98 commented 1 year ago

UPDATE: ACtually now that data is loaded perfectly using the script only

makseq commented 1 year ago

It would be awesome if you could contribute this script to our converter ;-)

Our CONLL converter code is here: https://github.com/heartexlabs/label-studio-converter/blob/master/label_studio_converter/converter.py#L446