fname missing error when building word2vec

Barcavin commented 1 year ago

Hi,

Thanks for such an amazing repo. It takes much effort to collect all the raw text data and preprocess them.

When I tried to build the word2vec features from raw data, I got such an error:

Traceback (most recent call last):
  File "repo/Graph-LLM/generate_pyg_data.py", line 237, in <module>
    main()
  File "repo/Graph-LLM/generate_pyg_data.py", line 104, in main
    data_obj.x = get_word2vec(data_obj.raw_texts)
  File "repo/Graph-LLM/data.py", line 411, in get_word2vec
    word2vec = KeyedVectors.load_word2vec_format(datapath(), binary = True)
TypeError: datapath() missing 1 required positional argument: 'fname'

However, I didn't find the corresponding word2vec file in the repo or google drive. can you guide me where to find the file?

Thanks,

CurryTang commented 1 year ago

Thanks for pointing this out. datapath() should be changed to w2v_path, which points to the original word2vec file. You may get it using gensim or download it manually from https://code.google.com/archive/p/word2vec/

Barcavin commented 1 year ago

Would you recommend training the word2vec embedding from scratch on the graph dataset to test upon or load a pre-trained one?

CurryTang commented 1 year ago

From our experimental results, the performance of word2vec as node features for gnn is not good (even outperformed by TF-IDF).
If you want to pre-train word2vec from scratch, you need a large corpus(for example, arxiv's word2vec is pre-trained on MAG). So, maybe it's not a good idea to use word2vec as features

Barcavin commented 1 year ago

Thank you so much for the discussion.

CurryTang / Graph-LLM

fname missing error when building word2vec #2