Open sleepandpancakes opened 9 months ago
You can obtain the vectors like this (see example in the readme):
import spacy
nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")
doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec
You can then use e. g. numpy
to do whatever vector arithmetic on the embeddings you obtained.
thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this
What you're looking for is a nearest neighbor search. sense2vec
doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:
scikit-learn
's KNN implementationannoy
or FAISSthank you again
how do i use the API to do manual vector arithmetic on vectorized words/phrases? for example, adding an arbitrary vector to vector corresponding to a word and returning the result? or linear interpolation between two vectorized words and converting to corresponding word?