VHRanger / nodevectors

Fastest network node embeddings in the west
MIT License
506 stars 59 forks source link

Issue with gensim 4.0.0+ #37

Open cthoyt opened 3 years ago

cthoyt commented 3 years ago

It appears one of the argument names has changed in the newly released version of GenSim. This has also caused some pain in other libraries using this package for node2vec implementations (e.g., https://github.com/krishnanlab/PecanPy/issues/16)

Traceback (most recent call last):
  File "embed_nodevectors.py", line 150, in <module>
    main()
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "embed_nodevectors.py", line 137, in main
    model.fit(graph)
  File "/Users/cthoyt/.virtualenvs/indra/lib/python3.8/site-packages/nodevectors/node2vec.py", line 130, in fit
    self.model = gensim.models.Word2Vec(
TypeError: __init__() got an unexpected keyword argument 'size'
VHRanger commented 3 years ago

Thanks.

I can make a patch that checks the gensim version for now and routes the argument depending on the version.

Long term the idea would be to remove the gensim dependency entirely. It's a heavy dependency that's a moving target and only used for this one part of Node2Vec.

It has a lot of overhead for Node2Vec. For one, we need to map nodenames back from random walks to a format gensim accepts.

We could just train a word2vec model directly on the nodeIDs (ints, so would be faster) and re-map the embedding dictionary keys from nodeID -> node name only once after everything is trained.

This could be achieved either by stripping the node2vec C code and integrating it in CSRGraphs or by using another C/C++ implementation, like this one:

https://github.com/xgfs/node2vec-c

(which works on CSR representation already, not too far from csrgraphs) or this one:

https://github.com/snap-stanford/snap/tree/master/examples/node2vec

and integrating it into CSRGraphs.

hhu1 commented 3 years ago

Following bash command worked for me:

pip3 install -I gensim==3.8.0
Wapiti08 commented 2 years ago

Following bash command worked for me:

pip3 install -I gensim==3.8.0

That did not solve my problem. It is still there