explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

[COREF] `en_coreference_web_trf(3.4.0a2)` breaks by storing some tensors on CPU and some on GPU #13023

Closed sztal closed 1 year ago

sztal commented 1 year ago

Problem as in the title. Code that is perfectly fine when spacy runs on CPU breaks when GPU acceleration is turned on. This happens at least for the model en_coreference_web_trf-3.4.0a2.

Note If there is any later release working out of the box without training that solves this problem, please let me know. My understanding from the docs of the coref component is that the one I use is the most recent trained component (and it indeed seems to work quite fine).

How to reproduce the behaviour

import spacy
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
doc.spans   # output: {'coref_clusters_1': [a party, It]}

So clearly the component does its job when running on CPU. But run this with spacy.prefer_gpu() and everything breaks:

import spacy
spacy.prefer_gpu()
nlp = spacy.load("en_coreference_web_trf")
doc = nlp("We went to a party. It was great.")
# RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It seems that some tensors are stored on GPU and some still on CPU. Apparently this inconsistency may appear in several different parts of the code but for sure for the above reprex it is happening around line 269 of pytorch_coref_model.py where an attempt at performing an operation using tensors word_ids (stored on CPU) and top_indices (stored on GPU) is made.

Your Environment

shadeMe commented 1 year ago

Do you have the latest version of spacy-experimental installed? It appears that this particular bug was fixed in v0.6.2.

sztal commented 1 year ago

Thanks! But will v0.6.2 work with spacy>=3.4,3.5 and en_coreference_web_trf(0.3.4.0a2) (which seems to be the latest model that works out-of-the-box)?

Anyways, I will try updating to v0.6.2 soon and will let you know. Thanks once again for the fast reply!

sztal commented 1 year ago

Okay, it works! I updated successfully to spacy-experimental(0.6.3), while keeping spacy>=3.4,<3.5 and using en_coreference_web_trf(0.3.4.0a2) and everything works as expected.

Thanks a lot!

l4b4r4b4b4 commented 1 year ago

interesting. encountering the same error. will try spacy experimental as well :)

github-actions[bot] commented 11 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.