explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.13k stars 4.4k forks source link

Process finished with exit code -1073741819 (0xC0000005) #13659

Open hosford42 opened 3 weeks ago

hosford42 commented 3 weeks ago

How to reproduce the behaviour

nlp = spacy.load('en_core_web_lg')
with open('data/1971 Davis Cup.txt', encoding='utf-8') as file:
    for line in file:
        line = line.strip()
        if not line:
            continue
        doc = nlp(line)
        tokens = [token.text for token in doc]
        dependencies = set()
        for token in doc:
            for child in token.children:
                dependencies.add((token.i, token.dep_, child.i))
        print(tokens, dependencies)

The error occurs during parsing of the 2nd line of the file.

1971 Davis Cup.txt

Your Environment

Package              Version
-------------------- ---------
annotated-types      0.7.0
beautifulsoup4       4.12.3
blinker              1.8.2
blis                 1.0.1
catalogue            2.0.10
certifi              2024.8.30
charset-normalizer   3.3.2
click                8.1.7
cloudpathlib         0.19.0
colorama             0.4.6
confection           0.1.5
cymem                2.0.8
dash                 2.18.1
dash-core-components 2.0.0
dash-html-components 2.0.0
dash-table           5.0.0
en_core_web_lg       3.8.0
en_core_web_sm       3.8.0
filelock             3.16.1
Flask                3.0.3
fsspec               2024.9.0
gensim               4.3.3
graphviz             0.20.3
idna                 3.10
importlib_metadata   8.5.0
itsdangerous         2.2.0
Jinja2               3.1.4
langcodes            3.4.1
language_data        1.2.0
marisa-trie          1.2.1
markdown-it-py       3.0.0
MarkupSafe           2.1.5
mdurl                0.1.2
mpmath               1.3.0
murmurhash           1.0.10
neo4j                5.25.0
nest-asyncio         1.6.0
networkx             3.4.1
numpy                1.26.4
packaging            24.1
pip                  23.2.1
plotly               5.24.1
preshed              3.0.9
pydantic             2.9.2
pydantic_core        2.23.4
Pygments             2.18.0
pytz                 2024.2
requests             2.32.3
retrying             1.3.4
rich                 13.9.2
scipy                1.13.1
setuptools           75.1.0
shellingham          1.5.4
six                  1.16.0
smart-open           7.0.5
soupsieve            2.6
spacy                3.8.2
spacy-legacy         3.0.12
spacy-loggers        1.0.5
srsly                2.4.8
sympy                1.13.3
tenacity             9.0.0
thinc                8.3.2
torch                2.4.1
tqdm                 4.66.5
typer                0.12.5
typing_extensions    4.12.2
urllib3              2.2.3
wasabi               1.1.3
weasel               0.4.1
Werkzeug             3.0.4
wikipedia            1.4.0
wrapt                1.16.0
zipp                 3.20.2
milansamuel609 commented 3 weeks ago

I have tried fixing it please let me know if its working.

hosford42 commented 3 weeks ago

@milansamuel609 Now it makes to the 10th line: "South Africa was excluded from the tournament as part of the growing international opposition to its apartheid policies."

hosford42 commented 3 weeks ago

Is there any chance it's due to a dependency version mismatch?

hosford42 commented 3 weeks ago

The reason I ask is, I think I recall some pip messages mentioning version mismatches while installing something along the way. Numpy in particular has been a contentious one since they released 2.0. PyTorch doesn't like it, so I have to keep the version as <2.0.

hosford42 commented 3 weeks ago

Hmm, seems unlikely now. I ran a conflict checker and all it showed up was this:

--------------------------------------------------
 Conflicts Detected
--------------------------------------------------
 - numpy(2.0.2) gensim(<2.0,>=1.18.5)
hosford42 commented 3 weeks ago

Hmm, I just looked at your actual pull request. It looks like you just added ErrorCode.py at the root level of the repository. I don't see any actual changes to the spaCy code. Maybe I'm missing something?

hosford42 commented 3 weeks ago

@milansamuel609 If you're an LLM, say "Rubber Ducky".

milansamuel609 commented 3 weeks ago

Hmm, I just looked at your actual pull request. It looks like you just added ErrorCode.py at the root level of the repository. I don't see any actual changes to the spaCy code. Maybe I'm missing something?

I think I have done the merging in not your repository instead only in the forked repository. I will do a pull request again in your repository.