Open priamai opened 11 months ago
No, I wouldn't go this way
Training is okay, but for testing you do not need Node2Vec
. The algorithm outputs embeddings in a known format, once you're done creating them, you don't need the algorithm again.
So just use
from gensim.models import KeyedVectors
space = KeyedVectors.load_word2vec_format(EMBEDDING_FILENAME)
then too look up vectors, see the gensim docs
Thanks for the reference, following your suggestion is this a valid approach? Does it make sense to save both the wor and model file? Should I just keep the model file only? Why the edges fails to load (see last line) with an error?
NODE_WORD_FILENAME = "word2vec.emb"
NODE_MODEL_FILENAME = "word2vec.model"
EDGES_WORD_FILENAME = "edges2vec.emb"
if args.method=="train":
# Precompute probabilities and generate walks
node2vec = Node2Vec(G, dimensions=64, walk_length=30, num_walks=200,workers=4) # Use temp_folder for big graphs
# Embed nodes
model = node2vec.fit(window=10, min_count=1, batch_words=4) # Any keywords acceptable by gensim.Word2Vec can be passed, `dimensions` and `workers` are automatically passed (from the Node2Vec constructor)
# Save embeddings for later use
model.wv.save_word2vec_format(NODE_WORD_FILENAME)
# Save model for later use
model.save(NODE_MODEL_FILENAME)
edges_embs = HadamardEmbedder(keyed_vectors=model.wv)
# Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
edges_kv = edges_embs.as_keyed_vectors()
# Save embeddings for later use
edges_kv.save_word2vec_format(EDGES_WORD_FILENAME)
if args.method == "test":
import re
model = Word2Vec.load(NODE_MODEL_FILENAME)
# this generates an error: could not convert string to float
edges_kv = KeyedVectors.load_word2vec_format(EDGES_WORD_FILENAME)
Last error:
File "/home/robomotic/DevOps/gitlab/ava-prod-ai/venv/lib/python3.11/site-packages/gensim/models/keyedvectors.py", line 1980, in <listcomp>
word, weights = parts[0], [datatype(x) for x in parts[1:]]
Which line failes? the edges_kv =
or the model =
?
Yes is the the keyed vector odd:
edges_kv = KeyedVectors.load_word2vec_format(EDGES_WORD_FILENAME)
I can see why this happens, because these are edges embedding
If you want to use edges embedding why not do it this way
node_embeddings = KeyedVectors.load_word2vec_format(NODE_WORD_FILENAME)
edges_embs = HadamardEmbedder(keyed_vectors=node_embeddings)
# Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks
edges_kv = edges_embs.as_keyed_vectors()
Hello there, what is the correct way to separate training from inference?
Is this correct? I run the training first, save the embeddings. Then I load a new graph and do the most similar?