explosion / spaCy

πŸ’« Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

Pre-trained coreference pipeline incompatible with spaCy > 3.4 #13111

Closed Fohlen closed 8 months ago

Fohlen commented 11 months ago

Dear spaCy team,

We are currently in the process of upgrading our workflows to spaCy 3.7x, part of which is spacy-experimental. A few weeks ago you already fixed the hard constraint in spacy-experimental such that it can be installed with 3.7x, thanks a lot πŸ₯‚ However, it turns out, the pre-trained model for coreference forces backwards-incompatibility.

How to reproduce the behaviour

When we run:

pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl

We get:

#5 72.70 Collecting en-coreference-web-trf==3.4.0a2
#5 73.25   Downloading https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl (490.3 MB)
#5 85.16      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 490.3/490.3 MB 8.5 MB/s eta 0:00:00
#5 86.25 Collecting spacy<3.5.0,>=3.3.0 (from en-coreference-web-trf==3.4.0a2)
#5 86.31   Downloading spacy-3.4.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)

In fact the coreference model downgrades spaCy installs, which will lead to PyPi failing (or incorrectly installing):

#5 96.36 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
#5 96.36 de-dep-news-trf 3.7.2 requires spacy<3.8.0,>=3.7.0, but you have spacy 3.4.4 which is incompatible.
#5 96.36 en-core-web-trf 3.7.2 requires spacy<3.8.0,>=3.7.0, but you have spacy 3.4.4 which is incompatible.

Your Environment

Proposed solution

Retrain / republish the pipeline. If given more verbose instructions, we are happy to do this for you.

shadeMe commented 11 months ago

Thanks for reporting this issue! We'll look into it.

In the meantime, you should be able to reinstall the latest version of spaCy after installing the experimental co-ref component: python -m pip install spacy -U. The dependency resolver error can be ignored in this case. Alternatively, you can manually edit the wheel to update the version pin.

Fohlen commented 10 months ago

Hi, I updated the version pin of the package and created a new wheel. When I run the following:

import spacy

nlp = spacy.load("en_core_web_trf")
nlp_coref = spacy.load("en_coreference_web_trf")
nlp.add_pipe("coref", source=nlp_coref)
sentences = nlp("my name is theodor and my wife's name is sandra")

I get:

/redacted/venv/lib/python3.10/site-packages/spacy_transformers/layers/hf_shim.py:137: UserWarning: Error loading saved torch state_dict with strict=True, likely due to differences between 'transformers' versions. Attempting to load with strict=False as a fallback...

If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current 'transformers' and 'spacy-transformers' versions. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "/redacted/run.py", line 6, in <module>
    sentences = nlp("my name is theodor and my wife's name is sandra")
  File "/redacted/venv/lib/python3.10/site-packages/spacy/language.py", line 1054, in __call__
    error_handler(name, proc, [doc], e)
  File "/redacted/venv/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
    raise e
  File "/redacted/venv/lib/python3.10/site-packages/spacy/language.py", line 1049, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
  File "spacy/pipeline/trainable_pipe.pyx", line 56, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
  File "/redacted/venv/lib/python3.10/site-packages/spacy/util.py", line 1704, in raise_error
    raise e
  File "spacy/pipeline/trainable_pipe.pyx", line 52, in spacy.pipeline.trainable_pipe.TrainablePipe.__call__
  File "/redacted/venv/lib/python3.10/site-packages/spacy_experimental/coref/coref_component.py", line 153, in predict
    scores, idxs = self.model.predict([doc])
  File "/redacted/venv/lib/python3.10/site-packages/thinc/model.py", line 334, in predict
    return self._func(self, X, is_train=False)[0]
  File "/redacted/venv/lib/python3.10/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/redacted/venv/lib/python3.10/site-packages/thinc/model.py", line 310, in __call__
    return self._func(self, X, is_train=is_train)
  File "/redacted/venv/lib/python3.10/site-packages/thinc/layers/chain.py", line 54, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/redacted/venv/lib/python3.10/site-packages/thinc/model.py", line 310, in __call__
    return self._func(self, X, is_train=is_train)
  File "/redacted/venv/lib/python3.10/site-packages/spacy_transformers/layers/trfs2arrays.py", line 40, in forward
    if "last_hidden_state" in trf_data.model_output:
AttributeError: 'DocTransformerOutput' object has no attribute 'model_output'

This suggests that there are incompatibilities between the packages. However, simply running text through the nlp_coref object works, so this is likely an issue with the extensions rather than the model.

adrianeboyd commented 10 months ago

This sample code that only sources the single coref component wouldn't work correctly for spacy v3.6, either. It wouldn't raise an error, but the coref output would be gibberish because it doesn't have its own separate transformer component in the pipeline.

This particular error message can definitely be improved, though.

If you want to combine them, see this thread for how to either replace the listeners or run the two pipelines separately on the same doc: https://github.com/explosion/spaCy/discussions/12302#discussioncomment-5080497

svlandeg commented 8 months ago

Assuming the provided workarounds are sufficient, I'll go ahead and close this issue now.

When we move coref into spaCy's core code base, it will be updated whenever we update spaCy, but unfortunately that's not a guarantee we can make right now as long as the component is experimental. We definitely have this on our list to provide better support in the future though!

github-actions[bot] commented 7 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.