dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.23k stars 2.99k forks source link

[GraphBolt] TorchData Pytorch support #7362

Open mfbalin opened 2 months ago

mfbalin commented 2 months ago

🔨Work Item

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

https://github.com/pytorch/pytorch/issues/124907#issuecomment-2077135173

Here, torch developers say that future versions of pytorch may not support torchdata properly. It might become a problem to support later PyTorch versions.

Rhett-Ying commented 2 months ago

Previously we're trying to deprecate torchdata with torch.utils.data for datapipe-related operations as active development and release of torchdata have been paused(mentioned here).

So for now, both pytorch and torchdata team are deprecating torchdata?

mfbalin commented 2 months ago

I don't know the exact details. We need to look into it as it is a crucial dependency.

mfbalin commented 2 months ago

https://discuss.dgl.ai/t/importerror-cannot-import-name-dill-available-from-torch-utils-data-datapipes-utils-common/4363/2 might be a related problem, I saw a PR in torch repo that fix this issue.

frozenbugs commented 2 months ago

The way we implement DataLoader (https://github.com/dmlc/dgl/blob/658b2086b09bbd76c3d3f488af2b155a1c921052/python/dgl/graphbolt/dataloader.py#L79C7-L79C17) right now isn't perfect. It makes a lot of assumption that might cause problems later. Once those problems hit, we should redesign it. We held off because the torch.data already does a good job, but if we have to, we'll tackle it then.

mfbalin commented 5 days ago

@frozenbugs @Rhett-Ying https://github.com/pytorch/data/pull/1277, we need to pin torchdata<0.8.0 to avoid the deprecation message.