explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

Can't replace NER pipe in an existing model. Works in 2.2.4 but crashes in 2.3 #5702

Closed eaporetsky closed 4 years ago

eaporetsky commented 4 years ago

How to reproduce the behaviour

Hello,

I am trying to replace the NER pipe from spaCy's en_core_web_md with my own since I don't need any of the NER but I would like to keep the trained tagger and parser. The docs aren't exactly clear about this but what I do is below (using the ANIMAL training example)

nlp = spacy.load('en_core_web_md')
nlp.remove_pipe("ner")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last=True)
ner.add_label(LABEL)
train(TRAIN_DATA) # in here I call optimizer = nlp.begin_training()  with the new entity
nlp.to_disk('en_core_new_ner')

This all goes fine and the model successfully trains itself (the losses look normal), and the model saves and loads to disk. However when I try to create a doc object and print the ents with the new model I get this error:

 File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "brain.py", line 192, in main
    doc = brain.nlp(example)
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 446, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "pipes.pyx", line 398, in spacy.pipeline.pipes.Tagger.__call__
  File "pipes.pyx", line 443, in spacy.pipeline.pipes.Tagger.set_annotations
  File "morphology.pyx", line 292, in spacy.morphology.Morphology.assign_tag_id
ValueError: [E014] Unknown tag ID: 25

My assumption is that calling nlp.begin_training() is also screwing up the tagger/parser, but the folder clearly has the tagger and parser in it and I disable pipes before training. However the tag_map in this new model looks different from the tag map in en_core_web_md. My training code is almost identical to the one in the training example https://spacy.io/usage/training#example-new-entity-type

Is there any advice on how to cleanly keep the tagger and parser from an existing model but replace the NER component (I don't want any of the built in entities) but also don't want to have to retrain a tagger and parser.

Your Environment

spaCy version 2.3.0
Location /usr/local/lib/python3.7/site-packages/spacy Platform Darwin-18.6.0-x86_64-i386-64bit Python version 3.7.7

eaporetsky commented 4 years ago

In a very hacky way, if I just save a copy of en_core_web_md and copy and paste the tagger/parser over it works, but obviously I don't want to do it in this way and would like to just save my model normally without the hack.

eaporetsky commented 4 years ago

Note: when I retry the code in spacy 2.2.4 it works fine (and the saved tag map looks ok) but breaks in 2.3

adrianeboyd commented 4 years ago

Thanks for the report! This is due to a bug related to tag maps in 2.3.0 (fixed in #5641). As a workaround, copying the tagger directory in the model should work, even if it's not the prettiest solution. I hope 2.3.1 will be ready today, if not then very soon.

github-actions[bot] commented 4 years ago

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.