dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.42k stars 3k forks source link

Missing dependencies when doing a PyPI-based install, torch 2.2.2 not supported #7247

Closed Andrew-S-Rosen closed 2 months ago

Andrew-S-Rosen commented 6 months ago

🐛 Bug

The pandas, pyyaml, and pydantic dependencies are not automatically installed, and an error is raised even after installing them manually. There is also an incompatibility with torch==2.2.2.

To Reproduce

conda create -name test python=3.10
conda activate test
pip install dgl==2.1.0
import dgl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 9, in <module>
    from .minibatch import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/minibatch.py", line 12, in <module>
    from .internal import get_attributes
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/internal/__init__.py", line 2, in <module>
    from .utils import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/internal/utils.py", line 10, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'
pip install pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 10, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/dataloader.py", line 12, in <module>
    from .impl.neighbor_sampler import SamplePerLayer
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/__init__.py", line 7, in <module>
    from .legacy_dataset import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/legacy_dataset.py", line 11, in <module>
    from .ondisk_dataset import OnDiskTask
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/ondisk_dataset.py", line 14, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'
pip install pyyaml
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 10, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/dataloader.py", line 12, in <module>
    from .impl.neighbor_sampler import SamplePerLayer
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/__init__.py", line 7, in <module>
    from .legacy_dataset import *
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/legacy_dataset.py", line 11, in <module>
    from .ondisk_dataset import OnDiskTask
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/ondisk_dataset.py", line 34, in <module>
    from .ondisk_metadata import (
  File "/home/rosen/software/miniconda/envs/test/lib/python3.10/site-packages/dgl/graphbolt/impl/ondisk_metadata.py", line 6, in <module>
    import pydantic
ModuleNotFoundError: No module named 'pydantic'
pip install pydantic

Expected behavior

No import errors and no crash (see below).

Environment

Additional context

Even installing those dependencies, there is still an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 55, in <module>
    load_graphbolt()
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 45, in load_graphbolt
    raise FileNotFoundError(
FileNotFoundError: Cannot find DGL C++ graphbolt library at /home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/libgraphbolt_pytorch_2.2.2.so

I'm leaning towards this being some incompatibility with the newly released torch==2.2.2. Doing the full install process with torch==2.2.1 seems to be okay.

mfbalin commented 6 months ago

Additional context

Even installing those dependencies, there is still an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 55, in <module>
    load_graphbolt()
  File "/home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/__init__.py", line 45, in load_graphbolt
    raise FileNotFoundError(
FileNotFoundError: Cannot find DGL C++ graphbolt library at /home/rosen/software/miniconda/envs/test2/lib/python3.10/site-packages/dgl/graphbolt/libgraphbolt_pytorch_2.2.2.so

I'm leaning towards this being some incompatibility with the newly released torch==2.2.2. Doing the full install process with torch==2.2.1 seems to be okay.

Hi @Andrew-S-Rosen, it looks like we currently support torch up to 2.2.1. DGL's next release should include out of the box support for later torch versions.

The list of currently supported torch versions:

2.0.0
2.0.1
2.1.0
2.1.1
2.1.2
2.2.0
2.2.1
mfbalin commented 6 months ago

You can build DGL from source if you want earlier support for the latest torch versions. We definitely need to fix the additional dependency issues though. @Rhett-Ying @frozenbugs

Andrew-S-Rosen commented 6 months ago

Hi @Andrew-S-Rosen, it looks like we currently support torch up to 2.2.1. DGL's next release should include out of the box support for later torch versions.

Good to know, thanks! In that case, should doing pip install dgl==2.1.0 also ensure that torch<=2.1.1 is returned so such issues don't happen to end users?

mfbalin commented 6 months ago

Hi @Andrew-S-Rosen, it looks like we currently support torch up to 2.2.1. DGL's next release should include out of the box support for later torch versions.

Good to know, thanks! In that case, should doing pip install dgl==2.1.0 also ensure that torch<=2.1.1 is returned so such issues don't happen to unexpecting end users?

We unfortunately can not do that. Then you wouldn't be able to build from source with torch 2.2.2. I for example work with torch==2.3.0a0+ebedce2. If we added that line, you wouldn't be able to build from source with latest torch versions as it would attempt to install torch==2.2.1.

What we could do however is to load graphbolt and the components relying on us to compile for specific torch versions to do a version check and not load them if the version is not suitable instead of giving an error, it can give a warning instead.

Andrew-S-Rosen commented 6 months ago

We unfortunately can not do that. Then you wouldn't be able to build from source with torch 2.2.2. I for example work with torch==2.3.0a0+ebedce2. If we added that line, you wouldn't be able to build from source with latest torch versions as it would attempt to install torch==2.2.1.

Ah, this certainly makes sense.

What we could do however is to load graphbolt and the components relying on us to compile for specific torch versions to do a version check and not load them if the version is not suitable instead of giving an error, it can give a warning instead.

Actually, I feel the error may be more helpful than a warning. It is easier to catch the error in CI for a downstream code relying on dgl as a dependency 👍

mfbalin commented 6 months ago

@Andrew-S-Rosen Then the error message could be improved as it is currently not very enlightening.

Andrew-S-Rosen commented 6 months ago

Definitely agree about that!

Rhett-Ying commented 6 months ago

The latest DGL 2.1.0 supports up to torch 2.2.1 for now. torch 2.2.2 is not supported yet. According to above discussions, we have 2 work items:

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

BrunoLiegiBastonLiegi commented 4 months ago

I have got the same problem of failing to load the graphbolt library. I tried several different versions of pytorch (2.1.0, 2.2.0, 2.2.1, 2.3.0) with cuda 12.1 and python 3.10 and 3.12. In most cases the Cannot load Graphbolt C++ library error is triggered by a OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory error, even though cuda-nvrtc is installed. However, with torch2.1.0 I get a OSError: libcusparse.so.12: cannot open shared object file: No such file or directory.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you