explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.21k stars 4.4k forks source link

Issue with training a new custom label #3760

Closed sohilsshah91 closed 5 years ago

sohilsshah91 commented 5 years ago

#2573 Reference: Issue pertaining to custom Named Entity Recognition Label with spaCy

I have referred the above issue #2573 and also spaCy NER adding custom label. I have used the following type of training data format: train_data= [('The Prostate Cancer is a killer',{'entities':[(4,9,'Custom_Label_Name') ] } ), ('The Prostate Cancer Ovarian Cancer too', {'entities': [(4, 19, 'Custom_Label_Name'), (20, 34, 'Custom_Label_Name')]})] as shown in spaCy's documentation and the exact same code apart from label name changed as shared. However I am getting the following error: KeyError: "[E022] Could not find a transition with the name 'U-Indication' in the NER model."

What could I be doing wrong?

Code for reference is as below:

` import plac import random from pathlib import Path import spacy from spacy.util import minibatch, compounding

Label = "Indications" @plac.annotations( model=("Model name. Defaults to blank 'en' model.", "option", "m", str), new_model_name=("New model name for model meta.", "option", "nm", str), output_dir=("Optional output directory", "option", "o", Path), n_iter=("Number of training iterations", "option", "n", int), ) def main(model=None, new_model_name="indication", output_dir=None, n_iter=10): """Set up the pipeline and entity recognizer, and train the new entity.""" random.seed(0) if model is not None: nlp = spacy.load(model) # load existing spaCy model print("Loaded model '%s'" % model) else: nlp = spacy.blank("en") # create blank Language class print("Created blank 'en' model")

Add entity recognizer to model if it's not in the pipeline

# nlp.create_pipe works for built-ins that are registered with spaCy
if "ner" not in nlp.pipe_names:
    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner)
# otherwise, get it, so we can add labels to it
else:
    ner = nlp.get_pipe("ner")

ner.add_label(Label)  # add new entity label to entity recognizer
# Adding extraneous labels shouldn't mess anything up
#ner.add_label("VEGETABLE")
if model is None:
    optimizer = nlp.begin_training()
else:
    optimizer = nlp.resume_training()
move_names = list(ner.move_names)
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):  # only train NER
    sizes = compounding(1.0, 4.0, 1.001)
    # batch up the examples using spaCy's minibatch
    for itn in range(n_iter):
        random.shuffle(train_data)
        batches = minibatch(train_data, size=sizes)
        losses = {}
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, drop=0.35, losses=losses)
        print("Losses", losses)

# test the trained model
test_text = "1.1 Adjuvant Ovarian Cancer: Injection, USP and Adriamycin (DOXOrubicin HCl) for Injection, USP is indicated as a component of multi-agent adjuvant chemotherapy for treatment of women with axillary lymph node involvement following resection of primary breast cancer"
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
    print(ent.label_, ent.text)

`

Your Environment

BreakBB commented 5 years ago

This is an issue with the naming of your label. I could reproduce your issue with Label = "Indications" and in the train data "Indication". Note the missing "s" at the end. Name both the same way to fix this issue.

sohilsshah91 commented 5 years ago

Thank you. It solved my issue.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.