Closed reesehyde closed 2 months ago
Since you use the CPU as the device, you can pass overlap_feature_fetch=False
to the DataLoader as a workaround.
I think the main issue is caused by you having probably installing the CPU version of DGL instead of CUDA. Can you tell us what is your installed DGL version? You can report the version by pip.
@reesehyde pls refer to this page for DGL installation. This is the official page you should refer to only. As for pip packages, we host them on AWS S3 by our own. We only uploaded CPU versions to PyPI only and we stop uploading since DGL 2.2.0. So please always fetch pip packages from AWS S3.
Ah apologies, the problem was indeed using the CPU version! I just had plain old 2.1.0
. Thank you @mfbalin and @Rhett-Ying for the help!
I managed to install this by downloading the correct wheel manually but have to fetch packages through a PyPI proxy. Would the team consider setting up the S3 bucket to be indexable by pip? I don't know exactly what that entails but looking through torch's bucket setup and testing some index urls, just hosting the repo.html
file as a file called dgl
might be sufficient? Then a fetch for dgl
version 2.3.0
with index url https://data.dgl.ai/wheels/torch-2.3/cu118
would look for a version list file (repo.html
) at https://data.dgl.ai/wheels/torch-2.3/cu118/dgl/
.
Ah apologies, the problem was indeed using the CPU version! I just had plain old
2.1.0
. Thank you @mfbalin and @Rhett-Ying for the help!I managed to install this by downloading the correct wheel manually but have to fetch packages through a PyPI proxy. Would the team consider setting up the S3 bucket to be indexable by pip? I don't know exactly what that entails but looking through torch's bucket setup and testing some index urls, just hosting the
repo.html
file as a file calleddgl
might be sufficient? Then a fetch fordgl
version2.3.0
with index urlhttps://data.dgl.ai/wheels/torch-2.3/cu118
would look for a version list file (repo.html
) athttps://data.dgl.ai/wheels/torch-2.3/cu118/dgl/
.
Maybe you can update the issue title now that we know what is going wrong.
Thanks @mfbalin, updated the title to reflect the new request. I read up a bit more on hosting a simple PyPI repository and it does look like simply hosting an index file at the /dgl
path should do the trick!
I'd be happy to create a PR for the update if someone could point me towards the S3-publish logic. I searched around in the repo for "repo.html" and "s3" but only found the CI/CD report and log uploads.
Thanks @mfbalin, updated the title to reflect the new request. I read up a bit more on hosting a simple PyPI repository and it does look like simply hosting an index file at the
/dgl
path should do the trick!I'd be happy to create a PR for the update if someone could point me towards the S3-publish logic. The searched around in the repo for "repo.html" and "s3" but only found the CI/CD report and log uploads.
@Rhett-Ying What do you think? I don't understand much from PyPI or pip.
@reesehyde could you show me the use case you want and the blocker? why current install command pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/repo.html
does not work for you? How would you like to install DGL? specify in a yaml?
Thanks @Rhett-Ying, I hadn't tried that command but you're right that it does the trick in pip
— I wasn't aware of the -f
HTML page instead of -i
Python Package Index option in pip
! The case I had in mind was essentially using -i
rather than -f
, which requires a proper PyPI index. This could be established by hosting the /repo.html
file at /dgl
and we could then use pip -i https://data.dgl.ai/wheels/torch-2.3/cu118
instead of pip -f https://data.dgl.ai/wheels/torch-2.3/cu118/repo.html
.
But I'm using Poetry rather than pip
, and it seems my issue is simply due to a bug in Poetry. When specifying the /repo.html
page as an a source URL the result is e.g.:
403 Client Error: Forbidden for url: https://data.dgl.ai/wheels/torch-2.3/cu118/repo.html/dgl-2.3.0%2Bcu118-cp310-cp310-manylinux1_x86_64.whl
Poetry's Single Page Link Source forces /repo.html
to a folder and then tries to build the relative link from it as /repo.html/file.whl
. It is supposed to support the single HTML index page so I'll just fix the bug there — thank you both for getting me pointed in the right direction!
🐛 Bug
When trying to construct a
dgl.graphbolt.DataLoader
in an environment supporting CUDA, the call totorch.ops.graphbolt.set_max_uva_threads()
fails with an AttributeErrorTo Reproduce
From the environment described below, attempt to create a Graphbolt datapipe per the Node Classification with Minibatch Sampling tutorial. Note that while the environment supports CUDA, the error is produced even when the CPU is used:
This results in:
Expected behavior
DataLoader to be created successfully
Environment
Additional context
I can confirm the graphbolt shared library is present for my PyTorch version:
I'm not sure how to check whether PyTorch is loading it correctly or at all.
Other Versions
Relatedly, my first reaction was to try to a different version of DGL and/or PyTorch. But I found that installing from PyPI on an x86-64 Linux machine I'm restricted to only using version
2.1.0
for v2. On PyPI the2.0.0
wheel is only available for Linuxaarch64
, and no Linux wheels are available for2.2.0
or2.2.1
. Could the CI/CD be updated to build more Linux wheels? I'd love to contribute there if someone could point me in the right direction!