Hi!
I tried training a custom Named Entity Recognition model using spaCy, but despite multiple trials, I get a warning telling me that there are misaligned entities in the training data that I had created.
import spacy
from spacy.training import Example
import random
nlp=spacy.load('en_core_web_sm')
training_data=[
("Hello from India", {""entities"": [(11, 15, ""GPE"")]})
]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
nlp.disable_pipes(*other_pipes)
optimizer=nlp.create_optimizer()
losses={}
for i in range(10): #10 is the epoch value
random.shuffle(training_data)
for text, annotation in training_data:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotation)
nlp.update([example], sgd = optimizer, losses=losses)
And the error generated is this. :
Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/spacy/training/iob_utils.py", line 141
warnings.warn(
UserWarning: [W030] Some entities could not be aligned in the text "Hello from India" with entities "[(11, 15, 'GPE')]". Use `spacy.training.offsets_to_biluo_tags(nlp.make_doc(text), entities)` to check the alignment. Misaligned entities ('-') will be ignored during training.
The entity "India" starts from index 11 and ends at 15, yet spaCy doesn't recognise that it's a token. Any help is appreciated.
Hi! I tried training a custom Named Entity Recognition model using spaCy, but despite multiple trials, I get a warning telling me that there are misaligned entities in the training data that I had created.
And the error generated is this. :
The entity "India" starts from index 11 and ends at 15, yet spaCy doesn't recognise that it's a token. Any help is appreciated.