benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

https://karateclub.readthedocs.io

GNU General Public License v3.0

2.17k stars 247 forks source link

g2vec_model.infer([test_graph]) returns different embeddings each time I call it #130

Closed mohamedelmesawy closed 1 year ago

mohamedelmesawy commented 1 year ago

graph2vec_model.infer([test_graph]) returns different embedding each time I call it.

even I used the same model and it was fitted for only one time, when i call the .infer() function, it returns different embedding.

I have also used the seed=42 in the model creation and in the python.
np.random.seed(42)
random.seed(42)

benedekrozemberczki commented 1 year ago

Is it on the same graph?

LucaCappelletti94 commented 1 year ago

How many threads are you using? The embedding process uses Doc2Vec from gensim, which employs data racing stochastic gradient descent, and therefore it is inherently non-reproducible as there is some collision between threads. These collisions do not notably impact the resulting embedding, but it introduces some extra randomness that is hard to control.

Consider testing to run with a single thread and see whether this difference disappears.