Several issues about TUDataset

jermainewang commented 3 years ago

I encountered several usability issues when developing the benchmark code on TUDataset.

There are two datasets LegacyTUDataset and TUDataset. They are quite different -- TUDataset does not have node features while LegacyTUDataset does. We should merge them and provide options to include node features per user request.
Does not support slicing or list indexing, which is very useful during train/val/test split. Ideally, dataset[index_list] should return a sub-dataset. I imagine this might be a common issue for all graph classification datasets.
Does not support shuffling. We should have sth like dataset.shuffle() which returns a new shuffled dataset, which is useful for creating stratified train/val/test splits.
dataset.num_labels returns a numpy.ndarray instead of an integer.
Lack an API for getting the number of input features, e.g., dataset.num_features.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] commented 2 years ago

This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.

dmlc / dgl