explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.18k stars 4.4k forks source link

bus error upon existing the program after using spacy on mac M1 #13204

Open koder-ua opened 10 months ago

koder-ua commented 10 months ago

How to reproduce the behaviour

On M1 any code, which uses spacy to parse a doc failing with (can only test on my laptop) Works fine on linux machine

[1] 73089 bus error

upon exit. On both sm and trf models

Your Environment

Info about spaCy

koder-ua commented 10 months ago

Here is some binary tb info

https://gist.github.com/koder-ua/8fd3e3fd795674b01d1ddbeda9400999

adrianeboyd commented 10 months ago

Thanks for the report!

The info provided makes this look specific to the trf model, in particular curated-tokenizers. If you have a minute, could you create a new venv without installing torch and with only the en_core_web_sm model and see if you still get the same error?

koder-ua commented 10 months ago

@adrianeboyd yep, seems like you right on clean python3.11 with only spacy & en_core_web_sm installed all works fine

python3.11 with only spacy and en_core_web_sm

~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
~

python3.11 with pytorch & co

✗ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
[1]    54694 bus error  python -c

Yet just installing trf model (which also installs torhc & co) did not cause the issue to appear:

(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("some text")'
(python311_clean) ➜  ~
adrianeboyd commented 10 months ago

If you also install sentencepiece in the new venv?

koder-ua commented 10 months ago

All fine

(python311_clean) ➜  ~ pip install sentencepiece
Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 19.0 MB/s eta 0:00:00
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("I have some text")'
(python311_clean) ➜  ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("I have some text")'
(python311_clean) ➜  ~
adrianeboyd commented 10 months ago

In general this seems to be a known issue related to sentencepiece, which is vendored in curated-tokenizers. I'm not currently sure exactly which conditions are necessary for you to run into it in practice, though.

danieldk commented 9 months ago

I think this is the same issue as https://github.com/google/sentencepiece/issues/579 . I am not sure though why the sentencepiece library is loaded. We link sentencepiece statically.

At any rate, the error comes from destructing absl::Flag. However absl:Flag is not needed for library-use of sentencepiece, but tends to creep back in as a dependency. I'll see if we can remove it in curated-tokenizers, which should avoid conflicts between different versions of sentencepiece.