Open koder-ua opened 10 months ago
Here is some binary tb info
https://gist.github.com/koder-ua/8fd3e3fd795674b01d1ddbeda9400999
Thanks for the report!
The info provided makes this look specific to the trf
model, in particular curated-tokenizers
. If you have a minute, could you create a new venv without installing torch and with only the en_core_web_sm
model and see if you still get the same error?
@adrianeboyd yep, seems like you right on clean python3.11 with only spacy & en_core_web_sm installed all works fine
python3.11 with only spacy and en_core_web_sm
~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
~
python3.11 with pytorch & co
✗ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
[1] 54694 bus error python -c
Yet just installing trf model (which also installs torhc & co) did not cause the issue to appear:
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("some text")'
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("some text")'
(python311_clean) ➜ ~
If you also install sentencepiece
in the new venv?
All fine
(python311_clean) ➜ ~ pip install sentencepiece
Collecting sentencepiece
Downloading sentencepiece-0.1.99-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 19.0 MB/s eta 0:00:00
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_trf"); npl("I have some text")'
(python311_clean) ➜ ~ python -c 'import spacy; npl = spacy.load("en_core_web_sm"); npl("I have some text")'
(python311_clean) ➜ ~
In general this seems to be a known issue related to sentencepiece
, which is vendored in curated-tokenizers
. I'm not currently sure exactly which conditions are necessary for you to run into it in practice, though.
I think this is the same issue as https://github.com/google/sentencepiece/issues/579 . I am not sure though why the sentencepiece library is loaded. We link sentencepiece statically.
At any rate, the error comes from destructing absl::Flag
. However absl:Flag
is not needed for library-use of sentencepiece, but tends to creep back in as a dependency. I'll see if we can remove it in curated-tokenizers
, which should avoid conflicts between different versions of sentencepiece.
How to reproduce the behaviour
On M1 any code, which uses spacy to parse a doc failing with (can only test on my laptop) Works fine on linux machine
[1] 73089 bus error
upon exit. On both sm and trf models
Your Environment
Info about spaCy