explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

Changing conv_depth of pretrained model #4934

Closed FallakAsad closed 4 years ago

FallakAsad commented 4 years ago

I am trying to change conv_depth of the pretrained model and then train it and save it. The training goes fine, however if I load the saved model, I see following error:

AttributeError: 'Residual' object has no attribute 'G'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/Services/XXX/__init__.py", line 483, in train_model
    nlp = load_model(modelName)
  File "/src/Services/XXX/__init__.py", line 593, in load_model
    nlp = spacy.load(model_dir + "/model")
  File "/usr/lib64/python3.6/site-packages/spacy/__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "/usr/lib64/python3.6/site-packages/spacy/util.py", line 168, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "/usr/lib64/python3.6/site-packages/spacy/util.py", line 211, in load_model_from_path
    return nlp.from_disk(model_path)
  File "/usr/lib64/python3.6/site-packages/spacy/language.py", line 878, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/lib64/python3.6/site-packages/spacy/util.py", line 670, in from_disk
    reader(path / key)
  File "/usr/lib64/python3.6/site-packages/spacy/language.py", line 873, in <lambda>
    p, exclude=["vocab"]
  File "nn_parser.pyx", line 644, in spacy.syntax.nn_parser.Parser.from_disk
ValueError: [E149] Error deserializing model. Check that the config used to create the component matches the model being loaded.

Do you think it is possible to change the conv_depth of pretrained model? Here is the part of code where I set conv_depth.

nlp = spacy.load('de_core_news_md')
ner = nlp.get_pipe('ner')
optimizer = nlp.begin_training(component_cfg={"ner": {"conv_depth": 15}})
// Training code here
// Saving model code here

After saving the model cfg looks as follows:

{
  "beam_width":1,
  "beam_density":0.0,
  "beam_update_prob":1.0,
  "cnn_maxout_pieces":3,
  "deprecation_fixes":{
    "vectors_name":"de_core_news_md.vectors"
  },
  "nr_class":106,
  "hidden_depth":1,
  "token_vector_width":96,
  "hidden_width":64,
  "maxout_pieces":2,
  "pretrained_vectors":"de_core_news_md.vectors",
  "bilstm_depth":0,
  "conv_depth":15,
  "min_action_freq":30

Your Environment

honnibal commented 4 years ago

One of the big problems with spaCy v2 has been the system for passing config around. I used environment variables as a quick hack for experimentation, and we ended up with a system where defaults could be injected at many places, including sometimes overriding saved models. So this is certainly a bug, and there's some chance updating to the latest version will fix it. I'm not sure what's going wrong though --- thanks for including the cfg, that would've been my first question.

My best guess is that the problem comes in here:

optimizer = nlp.begin_training(component_cfg={"ner": {"conv_depth": 15}})

The begin_training method should reinitialise the model weights, so you'll be starting from a new model. I'm not sure whether this is what you intend: if you want to add more layers on top of the pretrained model, you would need to write nlp.resume_training.

I think the model you're loading in is not respecting the conv depth setting, but it's still saving the setting in the cfg. The cfg and the model architecture don't match, leaving you with a model you can't load back.

It should be possible to add extra CNN layers to the model, after loading it back, but you'll need to reach into the undocumented internals of the model. We're close to releasing a new version of Thinc that fixes the design problems, and finally includes good docs. It also has a new config system. If you want to try it out, send me an email at matt@explosion.ai .

By the way, adding CNN layers is unlikely to be the most effective option. You'll be better off increasing token_vector_width, and possibly installing PyTorch and increasing bilstm_depth.

FallakAsad commented 4 years ago

Thanks for your answer. I currently have an alternate solutions in my mind, that is to reinitialize the weights and retrain the model with a larger conv_depth on the same dataset on which de_core_news_md was trained and then fine tune it on my data. that's why I tried using begin_training() function. However, I am not sure if both of the following code snippet are equivalent when training NER component:

nlp = spacy.load('de_core_news_md')
ner = nlp.get_pipe('ner')
optimizer = nlp.begin_training()
// Training code here

And

nlp = spacy.blank('de')
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
optimizer = nlp.begin_training()
// Training code here

Does the first code have some pretrained word vectors that will be used in the training of NER even after calling begin_training()? I am wondering if there would be any benefit of loading a 'de_core_news_md' model instead of creating a blank model even after calling begin_training()?

svlandeg commented 4 years ago

Yes, in the first example the pretrained vectors from the de_core_news_md will be used as input for the training, so you should definitely see a difference.

I tried replicating your original issue with

    nlp = English()
    ner = nlp.create_pipe("ner")
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    nlp.add_pipe(ner)
    optimizer = nlp.begin_training(component_cfg={"ner": {"conv_depth": 15}})

Then writing that to file with nlp.to_disk(tmp_dir) and reading back in. I have no issues with this, so I can't replicate your original bug.

It seems likely this got fixed in a new version of spaCy - I tested with 2.2.3. I'll assume for now that this was fixed. If you do upgrade and the problem persists, please feel free to let me know!

svlandeg commented 4 years ago

Wait, I spoke too soon. I can replicate it when using a model instead of a blank language.

svlandeg commented 4 years ago

@FallakAsad : is this currently blocking you? Like Matt said, these kind of things will hopefully become much easier to do with spacy 3 that will use the new thinc...

svlandeg commented 4 years ago

I think the model you're loading in is not respecting the conv depth setting, but it's still saving the setting in the cfg. The cfg and the model architecture don't match, leaving you with a model you can't load back.

That's exactly what happened. Because a Model was already present, the new conv_depth parameter was being ignored (the model left unchanged), but the parameter did get stored in the internal config dictionary, ultimately resulting in inconsistencies and a crash when performing IO.

PR https://github.com/explosion/spaCy/pull/5078 will prevent storing the new values when they're not used. This prevents the crash. Actually training a larger model needs to be done with a different approach as discussed above.

FallakAsad commented 4 years ago

@svlandeg This is not currently blocking me thanks.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.