explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.15k stars 4.4k forks source link

Error while loading spacy model from the pickle file #12062

Closed devendrasoni18 closed 1 year ago

devendrasoni18 commented 1 year ago

I am getting the following error while loading spacy NER model from the pickle file. self.model = pickle.load(open(model_path, 'rb'))

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.2.3\plugins\python-ce\helpers\pydev\pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2022.2.3\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\Projects\pythonworkspace\invoice_processing_prototype\invoice_data_extractor_notebook.py", line 101, in <module>
    extractor = InvoiceDataExtractor(model_dir_path, input_file_paths[0], config_path)
  File "C:\Projects\pythonworkspace\invoice_processing_prototype\invoicedataextractor.py", line 27, in __init__
    self.spatial_extractor = SpatialExtractor(model_dir_path, config_path)
  File "C:\Projects\pythonworkspace\invoice_processing_prototype\spatialextractor.py", line 54, in __init__
    self.inv_date = Model(f"{self.model_dir_path}\\invoice_date_with_corrected_training_data_and_line_seperator_21_07_2022.pkl",
  File "C:\Projects\pythonworkspace\invoice_processing_prototype\spatialextractor.py", line 34, in __init__
    self.model = pickle.load(open(model_path, 'rb'))
  File "stringsource", line 6, in spacy.pipeline.trainable_pipe.__pyx_unpickle_TrainablePipe
_pickle.PickleError: Incompatible checksums (0x417ddeb vs (0x61fbab5, 0x27e6ee8, 0xbe56bc9) = (cfg, model, name, scorer, vocab))

How to reproduce the behaviour

I have trained the NER model using the spacy version 3.1.2 and I recently upgraded the spacy to the latest 3.4. The error might be because of some version incompatibilities. If that is the case can someone confirm if is it possible to load spacy NER model trained on spacy version '3.1.2' can be loaded on the upgraded spacy '3.4'

Your Environment

polm commented 1 year ago

Models trained in the same major version (like v3) should be able to be loaded in later versions, but because there can be minor inconsistencies, we recommend thorough testing and, if possible, retraining.

You appear to be having a separate issue related to pickle serialization. While pickling should work with models, we recommend the native saving/loading features in most circumstances. So normally you should do this:

# where you create the model
nlp.to_disk("my_model")
# where you load the model
nlp = spacy.load("my_model")

I'm not sure if the pickle problem is due to changed versions in spaCy or what - I haven't seen that error before, and in fact Googling it turns up this issue, so it seem rather uncommon.

Is it possible for you to load the model in the original training environment and save it using the above method instead of pickle? If so I'd recommend that approach.

polm commented 1 year ago

On review, we can't guarantee that objects pickled with one version of spaCy will always work with another - pickle is just too low-level. For that reason we recommend you use our provided serialization API.

Since this isn't a bug, I'll move it to Discussions, but if you'd like more help just let us know.