Open abrarsharif66 opened 1 week ago
{"classes":["SOFTWARE_NAME","JOB_TYPE","EDUCATION","UNIVERSITY","DEGREE","YEARS_OF_EXPERIENCE","STATE","CITY","COUNTRY","PROGRAMING_CONCEPT","COMPANY_NAME","PROGRAMMING_LANGUAGE","FRAMEWORKS","SOFT_SKILLS","JOB_TITLE","NAME","EMAIL","PH.NO"],"annotations":[["Zixuan Wu zixwu@ucdavis.edu",{"entities":[[0,9,"NAME"],[10,27,"EMAIL"]]}],["1363 Briones Ct | Pleasanton, CA 94588 | (510) 676-7461",{"entities":[[41,55,"PH.NO"]]}]]}
How to reproduce the behaviour
I have use the following piece of code to convert json to spacy while validationg using spacy --debug i get whitespace error:
please help me how to resolve this
for text, annot in tqdm(TRAIN_DATA['annotations']): doc = nlp.make_doc(text) ents = [] for start, end, label in annot["entities"]: span = doc.char_span(start, end, label=label, alignment_mode="contract") if span is None: print("Skipping entity") else: ents.append(span) doc.ents = ents db.add(doc) db.to_disk("train_data.spacy")
Info about spaCy
spaCy version: 3.7.5
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
Pipelines: en_core_web_lg (3.7.1), en_core_web_sm (3.7.1)
Operating System:
Python Version Used:
spaCy Version Used:
Environment Information: