explosion / sense2vec

🦆 Contextually-keyed word vectors
https://explosion.ai/blog/sense2vec-reloaded
MIT License
1.62k stars 240 forks source link

s2v standalone breaks if require_gpu() is called from spacy (cupy) #155

Closed Louis-Paul-Bowman closed 1 year ago

Louis-Paul-Bowman commented 1 year ago

I am using spacy for NER, and later the S2V standalone on a smaller portion of the NER hits. In the class that implements the NER model, there is a call to require_gpu to ensure transformer inference is fast. Attempting to use the s2v standalone in the same process afterwards results in an exception from cupy complaining about implicit conversion from the cupy tensor to a numpy array.

Code snippet:

import spacy
spacy.require_gpu()
import sense2vec

s2v = sense2vec.Sense2Vec().from_disk("s2v_reddit_2019_lg")

s2v.most_similar("Bart_Simpson|PERSON") #<-- exception raised here

Traceback:

TypeError                                 Traceback (most recent call last)
Cell In[2], line 7
      3 import sense2vec
      5 s2v = sense2vec.Sense2Vec().from_disk("s2v_reddit_2019_lg")
----> 7 s2v.most_similar("Bart_Simpson|PERSON")

File [c:\Users\LPB\anaconda3\envs\sherlock\lib\site-packages\sense2vec\sense2vec.py:226](file:///C:/Users/LPB/anaconda3/envs/sherlock/lib/site-packages/sense2vec/sense2vec.py:226), in Sense2Vec.most_similar(self, keys, n, batch_size)
    224 # Always ask for more because we'll always get the keys themselves
    225 n = min(len(self.vectors), n + len(keys))
--> 226 rows = numpy.asarray(self.vectors.find(keys=keys))
    227 vecs = self.vectors.data[rows]
    228 average = vecs.mean(axis=0, keepdims=True)

File cupy\_core\core.pyx:1397, in cupy._core.core.ndarray.__array__()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

Versions: cupy-cuda112 10.6.0 sense2vec 2.0.1 spacy 3.4.3 spacy-alignments 0.8.6 spacy-transformers 1.1.8

Louis-Paul-Bowman commented 1 year ago

I'll note also that the problem doesn't occur if a sufficient cache is present, since there is no KNN calculation performed.

adrianeboyd commented 1 year ago

Thanks for the report, we'll take a look!