BUPT-GAMMA / OpenHGNN

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL.
Apache License 2.0
867 stars 143 forks source link

trainner defined without scripts and readme in output #92

Closed buaalyx closed 8 months ago

buaalyx commented 2 years ago

For example, DeepWalk and HeGAN trainner was defined but it seems no cmd in scripts/run_experiments.py and no readme doc in outputs/. I don't know the datasets supported and how to run the model.

Theheavens commented 2 years ago

Hello, First, the model DeepWalk is a homogeneous GNN algorithm, which will not be included in OpenHGNN. The reason why DeepWalk appears in trainer is that metapath2vec use sthe trainer way used in DeepWalk. If you are interested in DeepWalk, refer to implementation in DGL. Second, the HeGAN is uploaded now. However, our developer can't reproduce the performance even if we use the source code of the author. The performance in our implementation can only be up to that in the author's source code.

buaalyx commented 2 years ago

In the sample_graph_for_dis in HeGAN_trainner.py, the comment states the function returns 3 graphs(pos_hg, neg_hg1 and neg_hg2), but the exact result are pos_hg, pos_hg1, pos_hg2, and pos_hg2looks the same with pos_hg

buaalyx commented 2 years ago

Besides, I have a question about args.meta_path_key, for herecand mp2vec, does these two models only use one metapath? Many datasets contain multiple metapaths in their meta_paths_dict(eg. dblp4MAGNN has 'APVPA' & 'APA'), but in config.ini or config.py, the args for these two models only has one value namely 'APVPA'. So, will other metapaths use during trainning? How to use multiple metapaths?

Theheavens commented 2 years ago

Good question!

  1. It seems that the experiments of mp2vec only use one meta-path.
  2. In section 4.2.2 Setting the Fusion Function of herec, it offers three functions used in fusing different meta-paths, which will make it become end-to-end training. For generality, we only offer the embedding training, not including fusion training.
  3. For now, we recommend that set one meta-path during training and other meta-paths will not be used.
  4. How to use multiple meta-paths? The direct way is that concat the embeddings of different meta-paths. There are some advanced algorithms, like the fusion function in Herec and the HEAD.
clearhanhui commented 2 years ago

In the sample_graph_for_dis in HeGAN_trainner.py, the comment states the function returns 3 graphs(pos_hg, neg_hg1 and neg_hg2), but the exact result are pos_hg, pos_hg1, pos_hg2, and pos_hg2looks the same with pos_hg

neg_hg2 is negative sampled graph with wrong nodes embedding generated by Generator, but its adjacency matrix is real. So the adjacency matrix can be the same with pos_hg, and nodes embedding are assigned in HERE.

buaalyx commented 2 years ago

Thanks for reply but when I want to run hegan on my own datasets, at this part in sample_graph_for_dis

for nt in self.hg_dict.keys():
            for src in self.hg_dict[nt].keys():
                for i in range(self.k):

I found another question

  File "/root/Downloads/lyx/heter/hgt/base_methods/OpenHGNN-main/openhgnn/trainerflow/HeGAN_trainer.py", line 57, in sample_graph_for_dis
    dst = random.choice(self.hg_dict[nt][src][et])
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/random.py", line 261, in choice
    raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence

I found at last, self.hg_dict[nt][src][et] is an empty tensor([]). Since ntand srcare the keys of hg_dict, I guess maybe the parameter kis not compatible with my own datasets? I don't know the meaning of k, how should I set that value for my own dataset?

clearhanhui commented 2 years ago

Q1: The meaning of k A1: The parameter self.k here means the number of samples, and similar implementation can be found in the author's codes.

Q2: IndexError A2: I guess that your dataset may exist some unconnected edges. I suggest you can just skip it when it comes to empty tensor by writing a line of if statement: if len(self.hg_dict[nt][src][et]) == 0: continue