[BUG] Cannot read downloaded .npz files successfully for OpenKE datasets

Graph-Learning-Benchmarks / gli

🗂 Graph Learning Indexer: a contributor-friendly and metadata-rich platform for graph learning benchmarks. Dataloading, Benchmarking, Tagging, and more!

https://graph-learning-benchmarks.github.io/gli/

MIT License

42 stars 20 forks source link

[BUG] Cannot read downloaded .npz files successfully for OpenKE datasets #87

Closed tingwl0122 closed 2 years ago

tingwl0122 commented 2 years ago

Describe the bug I cannot read in downloaded .npz files for some OpenKE datasets at /GLB-Repo/datasets, such as FB13and WN18RR. I didn't test all of the OpenKE datasets, but I could successfully run codes for cora, citeseer, and PubMed.

(A PR #86 is created to update dataset preparation in /GLB-Repo/glb/tags.py )

To Reproduce Run python3 tags.py --metadata FB13--task task at /GLB-Repo/glb

Expected behavior

Comments The error occurs at _dfs_read_file in graph.py. array = file_reader.get(path, d.get("key"), device) cannot merge path, keys and device name successfully.

tingwl0122 commented 2 years ago

I think this is mainly due to data type? For OpenKE datasets, the code is stuck at reading the first key: node_name, where it supposes to be lots of characters; while for datasets like cora, the information is mainly tensor or csr_matrix.

xingjian-zhang commented 2 years ago

I will have a look at that! Thank you for reporting this issue.

xingjian-zhang commented 2 years ago

I have reproduced the error. The problem is that

We did not define how to handle string arrays, and
DGL does not support string tensors as node attributes (Neither does PyTorch.)

xingjian-zhang commented 2 years ago

After discussion, we decide to skip non-numeric attributes. I am opening a new PR.

tingwl0122 commented 2 years ago

Got it! I'll test this after that. Thank you!

tingwl0122 commented 2 years ago

Hi, I am not sure whether this issue was resolved? I tried to run tests on these OpenKE datasets today, but the same problem still occurred.

xingjian-zhang commented 2 years ago

Hi @tingwl0122 ! Could you please try this new PR on your machine?

tingwl0122 commented 2 years ago

Sure. I'll try this in a second.