Graph-Learning-Benchmarks / gli

🗂 Graph Learning Indexer: a contributor-friendly and metadata-rich platform for graph learning benchmarks. Dataloading, Benchmarking, Tagging, and more!
https://graph-learning-benchmarks.github.io/gli/
MIT License
42 stars 20 forks source link

Deprecate outdated doc, templates, and auxiliary files. #465

Closed xingjian-zhang closed 1 year ago

xingjian-zhang commented 1 year ago

Description

Related Issue

This PR attempts to fix #462, #425, and #398.

Motivation and Context

How Has This Been Tested?

This change does not involve source code change.

github-actions[bot] commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.

xingjian-zhang commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.

This is expected as we are modifying all datasets by removing the urls.json.

xingjian-zhang commented 1 year ago

Pytests failed: log. Fails contain two parts:

  1. KeyError: 'predict_tail' for all KGEntityPrediction tasks.
  2. Cannot fetch urls for arxiv-year, snap-patents, and twitch-gamers.

I think the tests failure are triggered by previous code.

github-actions[bot] commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.

jiaqima commented 1 year ago

I tried to run the following code and successfully got the url for snap-patents.npz:

from gli.utils import _get_url_from_server
print(_get_url_from_server('snap_patents.npz'))

The result is 'https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0'.

Maybe the HTTPS server is unstable?

xingjian-zhang commented 1 year ago

I tried to run the following code and successfully got the url for snap-patents.npz:

from gli.utils import _get_url_from_server
print(_get_url_from_server('snap_patents.npz'))

The result is 'https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0'.

Maybe the HTTPS server is unstable?

In [1]: from gli.utils import _get_url_from_server

In [2]: from gli import get_gli_graph

In [3]: for i in range(5):
   ...:     print(_get_url_from_server('snap_patents.npz'))
   ...: 
https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0
https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0
https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0
https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0
https://www.dropbox.com/s/yplq00csa3vyogp/snap_patents.npz?dl=0

In [4]: get_gli_graph('snap-patents')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In [4], line 1
----> 1 get_gli_graph('snap-patents')

File ~/Projects/Private/gli/gli/dataloading.py:139, in get_gli_graph(dataset, device, verbose)
    137 if not os.path.exists(metadata_path):
    138     raise FileNotFoundError(f"{metadata_path} not found.")
--> 139 download_data(dataset, verbose=verbose)
    141 return read_gli_graph(metadata_path, device=device, verbose=verbose)

File ~/Projects/Private/gli/gli/utils.py:367, in download_data(dataset, verbose)
    365             data_file_url_dict[data_file] = url_dict[data_file]
    366         else:
--> 367             raise FileNotFoundError(f"cannot find url for {data_file}.")
    369 for data_file_name, url in data_file_url_dict.items():
    370     data_file_path = os.path.join(data_dir, data_file_name)

FileNotFoundError: cannot find url for snap-patents.npz.

I can fetch the url directly by calling _get_url_from_server but failed to fetch it when calling it inside get_gli_graph(). This is unexpected. Let me have a closer look into this issue.

github-actions[bot] commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.

jiaqima commented 1 year ago

Fixed predict_tail error via #468.

xingjian-zhang commented 1 year ago

Found the bug:

twitch-gamers and arxiv-year share the same issue. I have temporarily fixed them by modifying corresponding metadata.json manually. This vulnerability would be resolved in the future when we enforce function-based interface to contribute dataset.

github-actions[bot] commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.

github-actions[bot] commented 1 year ago

This is an automatic reminder for pasting the local test results of wiki as a comment in this PR, in case you haven't done so. The aforementioned datasets are too large for them to be tested with GitHub Action workflow here. The local test result for each dataset can be obtained by running make pytest DATASET=<dataset name>. For more details, please refer to the dataset submission guide.