Closed buaalyx closed 8 months ago
Hello,
First, the model DeepWalk
is a homogeneous GNN algorithm, which will not be included in OpenHGNN. The reason
why DeepWalk appears in trainer
is that metapath2vec
use sthe trainer way used in DeepWalk
. If you are interested in DeepWalk, refer to implementation in DGL.
Second, the HeGAN is uploaded now. However, our developer can't reproduce the performance even if we use the source code of the author. The performance in our implementation can only be up to that in the author's source code.
In the sample_graph_for_dis
in HeGAN_trainner.py, the comment states the function returns 3 graphs(pos_hg, neg_hg1 and neg_hg2), but the exact result are pos_hg, pos_hg1, pos_hg2
, and pos_hg2
looks the same with pos_hg
Besides, I have a question about args.meta_path_key
, for herec
and mp2vec
, does these two models only use one metapath? Many datasets contain multiple metapaths in their meta_paths_dict
(eg. dblp4MAGNN has 'APVPA'
& 'APA'
), but in config.ini
or config.py
, the args for these two models only has one value namely 'APVPA'
. So, will other metapaths use during trainning? How to use multiple metapaths?
Good question!
mp2vec
only use one meta-path.Setting the Fusion Function
of herec
, it offers three functions used in fusing different meta-paths, which will make it become end-to-end training. For generality, we only offer the embedding training, not including fusion training.In the
sample_graph_for_dis
in HeGAN_trainner.py, the comment states the function returns 3 graphs(pos_hg, neg_hg1 and neg_hg2), but the exact result arepos_hg, pos_hg1, pos_hg2
, andpos_hg2
looks the same withpos_hg
neg_hg2
is negative sampled graph with wrong nodes embedding generated by Generator, but its adjacency matrix is real. So the adjacency matrix can be the same with pos_hg
, and nodes embedding are assigned in HERE.
Thanks for reply but when I want to run hegan on my own datasets, at this part in sample_graph_for_dis
for nt in self.hg_dict.keys():
for src in self.hg_dict[nt].keys():
for i in range(self.k):
I found another question
File "/root/Downloads/lyx/heter/hgt/base_methods/OpenHGNN-main/openhgnn/trainerflow/HeGAN_trainer.py", line 57, in sample_graph_for_dis
dst = random.choice(self.hg_dict[nt][src][et])
File "/usr/local/anaconda3/envs/d80/lib/python3.7/random.py", line 261, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
I found at last, self.hg_dict[nt][src][et]
is an empty tensor([]).
Since nt
and src
are the keys of hg_dict
, I guess maybe the parameter k
is not compatible with my own datasets? I don't know the meaning of k, how should I set that value for my own dataset?
Q1: The meaning of k
A1: The parameter self.k
here means the number of samples, and similar implementation can be found in the author's codes.
Q2: IndexError
A2: I guess that your dataset may exist some unconnected edges. I suggest you can just skip it when it comes to empty tensor by writing a line of if statement: if len(self.hg_dict[nt][src][et]) == 0: continue
For example, DeepWalk and HeGAN trainner was defined but it seems no cmd in scripts/run_experiments.py and no readme doc in outputs/. I don't know the datasets supported and how to run the model.