dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.38k stars 3k forks source link

update dgl to cuda 12.4 pytorch 2.4.x got error "FileNotFoundError: Cannot find DGL C++ sparse library at /opt/conda/envs/torch124/lib/python3.11/site-packages/dgl/dgl_sparse/libdgl_sparse_pytorch_2.4.0.post301.so" #7791

Open NicksonCheng opened 1 week ago

NicksonCheng commented 1 week ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

1. 1. 1.

Expected behavior

Environment

Additional context

update dgl to cuda 12.4 pytorch 2.4.x got error "FileNotFoundError: Cannot find DGL C++ sparse library at /opt/conda/envs/torch124/lib/python3.11/site-packages/dgl/dgl_sparse/libdgl_sparse_pytorch_2.4.0.post301.so"

jansole commented 5 days ago

hey! i have the same problem. i'm working on a windows machine in colab in python 3.10.

i have relatively no issues installing it:

Installing collected packages: torchdata, dgl Successfully installed dgl-2.1.0+cu121 torchdata-0.8.0

but when i execute the code i get this:

FileNotFoundError Traceback (most recent call last) in <cell line: 1>() ----> 1 import dgl 2 import networkx as nx 3 import matplotlib.pyplot as plt 4 5 dgl.backend = 'pytorch'

6 frames /usr/local/lib/python3.10/dist-packages/dgl/graphbolt/init.py in load_graphbolt() 43 path = os.path.join(dirname, "graphbolt", basename) 44 if not os.path.exists(path): ---> 45 raise FileNotFoundError( 46 f"Cannot find DGL C++ graphbolt library at {path}" 47 )

FileNotFoundError: Cannot find DGL C++ graphbolt library at /usr/local/lib/python3.10/dist-packages/dgl/graphbolt/libgraphbolt_pytorch_2.4.0.so

kjczarne commented 3 days ago

I think something is wrong with pip wheels provided for this project. I am getting the same error, where the libgraphbolt_pytorch_2.4.1.so is not being found and indeed, when listing that directory, the library is missing. The documentation is misleading because it suggests that you can install this without any issues using pip and that is not the case.

I only got the Conda installation to work but this is quite frustrating when you're working in a Docker container, to not be able to install this via vanilla pip.

kjczarne commented 3 days ago

In the Jenkinsfile the libraries that seem to be copied at build are defined as:

dgl_linux_libs = 'build/libdgl.so, build/runUnitTests, python/dgl/_ffi/_cy3/core.cpython-*-x86_64-linux-gnu.so, build/tensoradapter/pytorch/*.so, build/dgl_sparse/*.so, build/graphbolt/*.so'

I suspect then something is broken with graphbolt libs, perhaps only those for older versions of PyTorch are built when the workflow is triggered. Here are those that I have found in the installed dgl package:

libgraphbolt_pytorch_2.0.0.so
libgraphbolt_pytorch_2.0.1.so
libgraphbolt_pytorch_2.1.0.so
libgraphbolt_pytorch_2.1.1.so
libgraphbolt_pytorch_2.1.2.so
libgraphbolt_pytorch_2.2.0.so
libgraphbolt_pytorch_2.2.1.so