BUPT-GAMMA / OpenHGNN

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL.
Apache License 2.0
828 stars 141 forks source link

"dblp4HAN" dataset bug #137

Closed luoxc007 closed 4 months ago

luoxc007 commented 1 year ago

🐛 Bug

When I ran "python -u /home/wj/dgl/OpenHGNN-main/main.py -m HAN -d dblp4HAN -t node_classification -g 6 --use_best_config --load_from_pretrained" with openhgnn, I got an error as "UnboundLocalError: local variable '_dataset' referenced before assignment".

To Reproduce

Steps to reproduce the behavior:

1.Just run as the command I shown above.

Expected behavior

  1. I traced the code and I found the source code was implemented with many "elif" to extinct the dataset name but without an "else" so when we input an invalid dataset name it will report an error about the variable but not the dataset name we inputted. So we can just add an "else" to improve the code error reports.
  2. And I found the "dblp4HAN" was introduced in the README.md file in openhgnn/dataset , but actually there is not such a dataset, so we can just modify this file?

Environment

Additional context

None.

Zhanghyi commented 1 year ago

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

wwddd66 commented 1 year ago

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

Zhanghyi commented 1 year ago

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

You are right. metapath2vec embedding is not available for dblp4GTN, one solution is to run metapath2vec on the dataset to generate the embedding and then assign it as node feature before running HGSL.

Zhanghyi commented 1 year ago

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

You are right. metapath2vec embedding is not available for dblp4GTN, one solution is to run metapath2vec on the dataset to generate the embedding and then assign it as node feature before running HGSL.

For example, you can modify the meta_path_key to APCPA in the config.ini file and then run the following command: python main.py -m Metapath2vec -t node_classification -d dblp4GTN -g -1 This will output the embeddings of authors in the output/metapath2vec directory.

wwddd66 commented 1 year ago

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

wwddd66 commented 1 year ago

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

Zhanghyi commented 1 year ago

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

The embedding file contains all node types in the same order as g.ntypes. We will update the relevant documentation for clarity.

wwddd66 commented 1 year ago

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

The embedding file contains all node types in the same order as g.ntypes. We will update the relevant documentation for clarity.

Thanks!