how to do vector arithmetic?

explosion / sense2vec

🦆 Contextually-keyed word vectors

https://explosion.ai/blog/sense2vec-reloaded

MIT License

1.62k stars 240 forks source link

how to do vector arithmetic? #160

Open sleepandpancakes opened 9 months ago

sleepandpancakes commented 9 months ago

how do i use the API to do manual vector arithmetic on vectorized words/phrases? for example, adding an arbitrary vector to vector corresponding to a word and returning the result? or linear interpolation between two vectorized words and converting to corresponding word?

rmitsch commented 9 months ago

You can obtain the vectors like this (see example in the readme):

import spacy

nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")

doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec

You can then use e. g. numpy to do whatever vector arithmetic on the embeddings you obtained.

sleepandpancakes commented 9 months ago

thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this

rmitsch commented 9 months ago

What you're looking for is a nearest neighbor search. sense2vec doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:

In-memory solutions like scikit-learn's KNN implementation
File-based solutions like annoy or FAISS
Vector DBs like Weaviate, Pinecone, etc.

sleepandpancakes commented 9 months ago

thank you again