Closed FallakAsad closed 4 years ago
One of the big problems with spaCy v2 has been the system for passing config around. I used environment variables as a quick hack for experimentation, and we ended up with a system where defaults could be injected at many places, including sometimes overriding saved models. So this is certainly a bug, and there's some chance updating to the latest version will fix it. I'm not sure what's going wrong though --- thanks for including the cfg
, that would've been my first question.
My best guess is that the problem comes in here:
optimizer = nlp.begin_training(component_cfg={"ner": {"conv_depth": 15}})
The begin_training
method should reinitialise the model weights, so you'll be starting from a new model. I'm not sure whether this is what you intend: if you want to add more layers on top of the pretrained model, you would need to write nlp.resume_training
.
I think the model you're loading in is not respecting the conv depth setting, but it's still saving the setting in the cfg. The cfg and the model architecture don't match, leaving you with a model you can't load back.
It should be possible to add extra CNN layers to the model, after loading it back, but you'll need to reach into the undocumented internals of the model. We're close to releasing a new version of Thinc that fixes the design problems, and finally includes good docs. It also has a new config system. If you want to try it out, send me an email at matt@explosion.ai .
By the way, adding CNN layers is unlikely to be the most effective option. You'll be better off increasing token_vector_width
, and possibly installing PyTorch and increasing bilstm_depth
.
Thanks for your answer. I currently have an alternate solutions in my mind, that is to reinitialize the weights and retrain the model with a larger conv_depth on the same dataset on which de_core_news_md was trained and then fine tune it on my data. that's why I tried using begin_training() function. However, I am not sure if both of the following code snippet are equivalent when training NER component:
nlp = spacy.load('de_core_news_md')
ner = nlp.get_pipe('ner')
optimizer = nlp.begin_training()
// Training code here
And
nlp = spacy.blank('de')
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
optimizer = nlp.begin_training()
// Training code here
Does the first code have some pretrained word vectors that will be used in the training of NER even after calling begin_training()? I am wondering if there would be any benefit of loading a 'de_core_news_md' model instead of creating a blank model even after calling begin_training()?
Yes, in the first example the pretrained vectors from the de_core_news_md
will be used as input for the training, so you should definitely see a difference.
I tried replicating your original issue with
nlp = English()
ner = nlp.create_pipe("ner")
for _, annotations in TRAIN_DATA:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
nlp.add_pipe(ner)
optimizer = nlp.begin_training(component_cfg={"ner": {"conv_depth": 15}})
Then writing that to file with nlp.to_disk(tmp_dir)
and reading back in. I have no issues with this, so I can't replicate your original bug.
It seems likely this got fixed in a new version of spaCy - I tested with 2.2.3. I'll assume for now that this was fixed. If you do upgrade and the problem persists, please feel free to let me know!
Wait, I spoke too soon. I can replicate it when using a model instead of a blank language.
@FallakAsad : is this currently blocking you? Like Matt said, these kind of things will hopefully become much easier to do with spacy 3 that will use the new thinc...
I think the model you're loading in is not respecting the conv depth setting, but it's still saving the setting in the cfg. The cfg and the model architecture don't match, leaving you with a model you can't load back.
That's exactly what happened. Because a Model
was already present, the new conv_depth
parameter was being ignored (the model left unchanged), but the parameter did get stored in the internal config dictionary, ultimately resulting in inconsistencies and a crash when performing IO.
PR https://github.com/explosion/spaCy/pull/5078 will prevent storing the new values when they're not used. This prevents the crash. Actually training a larger model needs to be done with a different approach as discussed above.
@svlandeg This is not currently blocking me thanks.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I am trying to change conv_depth of the pretrained model and then train it and save it. The training goes fine, however if I load the saved model, I see following error:
Do you think it is possible to change the conv_depth of pretrained model? Here is the part of code where I set conv_depth.
After saving the model cfg looks as follows:
Your Environment