ManivannanMurugavel / spacy-ner-annotator

Train Spacy ner with custom dataset
https://medium.com/@manivannan_data/how-to-train-ner-with-custom-training-data-using-spacy-188e0e508c6
182 stars 111 forks source link

Convert to Spacy format #4

Open obtic-sorbonne opened 5 years ago

obtic-sorbonne commented 5 years ago

Hi ! Does your tool convert files generated by webanno (uima binary cas, uima cas json, uima cas xmi, conll, tsv3) to Spacy NER training data format? Thank you in advance !!!

ManivannanIkomet commented 5 years ago

Hi, Once the annotation is completed in webanno and download annotated json. After you can use the python script to convert to spacy format

obtic-sorbonne commented 5 years ago

Thank you for your answer! I am beginner in this field, so all these details are new for me. I exported from webanno the annotated sentences (in arabic) to uima cas Json format (attachment). When I apply your convert_spacy_train_data.py script to my data, i get this error:

Traceback (most recent call last): File "D:\OBVIL\Bureau\spacy-ner-annotator-master\convert_spacy_train_data.py", line 12, in <module> ents = [tuple(entity) for entity in data['entities']] TypeError: string indices must be integers

Can you advise please?

CURATION_USER.zip