Closed Rohanjames1997 closed 2 years ago
Hi,
For windows now dgl support is a bit tricky. Please try the following steps:
tf-nightly
instead of other tensorflow version. Because the function we needed is only available in the latest nightly build. (And this would be available in tensorflow 2.2 official release)USE_OFFICIAL_TFDLPACK
to true.
import os
os.env['USE_OFFICIAL_TFDLPACK'] = "true"
# then import dgl or other codes
import dgl
Hello, Thanks for your reply @VoVAllen .
Unfortunately, the error persists after installing tf-nightly too. Setting the environment variable USE_OFFICIAL_TFDLPACK to true did not make any difference. This is because the error is still due to the line 12 in tensor.py: from ... import ndarray as nd
Would waiting for the official tf 2,2 release be helpful in this case?
Thank you.
Could you post the detailed error? I tested it works at my side
@VoVAllen I am facing a similar error and the tf version is 2.3.1
. It was working earlier but i installed the CUDA version of dgl using conda install -c dglteam dgl-cuda11.0
and im getting the following error.
from dgllife.model.model_zoo import GCNPredictor
File "D:\Anaconda\envs\myenv\lib\site-packages\dgllife\__init__.py", line 9, in <module>
from . import model
File "D:\Anaconda\envs\myenv\lib\site-packages\dgllife\model\__init__.py", line 6, in <module>
from .gnn import *
File "D:\Anaconda\envs\myenv\lib\site-packages\dgllife\model\gnn\__init__.py", line 8, in <module>
from .attentivefp import *
File "D:\Anaconda\envs\myenv\lib\site-packages\dgllife\model\gnn\attentivefp.py", line 9, in <module>
import dgl.function as fn
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\__init__.py", line 14, in <module>
from .backend import load_backend, backend_name
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\backend\__init__.py", line 73, in <module>
load_backend(get_preferred_backend())
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\backend\__init__.py", line 23, in load_backend
mod = importlib.import_module('.%s' % mod_name, __name__)
File "D:\Anaconda\envs\myenv\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\backend\pytorch\__init__.py", line 1, in <module>
from .tensor import *
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\backend\pytorch\tensor.py", line 11, in <module>
from ... import ndarray as nd
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\ndarray.py", line 14, in <module>
from ._ffi.object import register_object, ObjectBase
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\_ffi\object.py", line 8, in <module>
from .object_generic import ObjectGeneric, convert_to_object
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\_ffi\object_generic.py", line 7, in <module>
from .base import string_types
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\_ffi\base.py", line 42, in <module>
_LIB, _LIB_NAME = _load_lib()
File "D:\Anaconda\envs\myenv\lib\site-packages\dgl\_ffi\base.py", line 34, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "D:\Anaconda\envs\myenv\lib\ctypes\__init__.py", line 381, in __init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Anaconda\envs\myenv\lib\site-packages\dgl\dgl.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Hi,
The reason may be that the file itself or one of the dependencies (e.g. CUDA 11.0 library) is missing. A likely case is that you did not install CUDA 11.0 through NVIDIA installer so the system cannot find it.
Could you check if the file D:\Anaconda\envs\myenv\lib\site-packages\dgl\dgl.dll
exists? If so, could you check if the dependencies are fulfilled? You can drag the DLL file into Dependencies.exe and see if there is any question mark.
Hi, i had previously install CUDA 11.2 and after installing 11.0 the issue persists. The D:\Anaconda\envs\myenv\lib\site-packages\dgl\dgl.dll
file exists. Using the Dependencies.exe
i get We could not find api-ms-win-core-wow64-l1-1-0.dll file on the disk anymore.
@BarclayII
Could you try installing Visual C++ 2017 redistributable? Also were you running Windows 10? @Chokerino
@BarclayII yes im on windows 10. I switched to cuda 10.2 to run the 0.4.3
version of dgl and the dependencies now show 3 files missing. cublas64_10.dll
, cusparse64_10.dll
and the api-ms-win-core-wow64-l1-1-0.dll
as before. This is after installing the Visual C++ 2017 redistributable.
For CUDA 11.0 you might need Visual C++ 2019 redistributable.
@BarclayII I have installed both as the redistributable files are the same. I have also tried to manually add the files. cublas64_10.dll
and cusparse64_10.dll
get successfully added but even after putting the api-ms-win-core-wow64-l1-1-0.dll
file in the System32
folder shows an error in dependencies.
If someone encounters this problem, it may be that the version of cuda and the version of dgl do not match.
Unfortunately, I have run into the same problem. Perhaps these system settings can help pin down the problem:
Windows 10
CUDA 11.1
cuDNN 8
Python 3.9.5
TensorFlow 2.5.0
PyTorch 1.8.1
DGL 0.6.1 (cu111)
I have installed everything into a venv
environment with pip
(I'm not using conda
). Importing and running PyTorch and TensorFlow by themselves works without a hitch, including GPU capability (my PATH
is set up with both CUDA and cuDNN directories as required). Running DGL with DGLBACKEND=pytorch
also appears to run smoothly (which I assume is using the GPU libraries that come bundled with PyTorch). However, when I set DGLBACKEND=tensorflow
, the same error as the original issue here occurs. I checked the dependencies of dgl.dll
with the Dependecies application, and verified that each of them could be loaded manually in python (using ctypes.CDLL(...)
). I've pasted the error I get below for completion. As you can see, DGL is able to find at least cudart64_110.dll
, which is part of the CUDA 11.1 distribution.
(.venv) C:\Users\s092292\Desktop\rcpsp>set DGLBACKEND=tensorflow
(.venv) C:\Users\s092292\Desktop\rcpsp>python
Python 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dgl
2021-06-03 20:13:17.729320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\__init__.py", line 13, in <module>
from .backend import load_backend, backend_name
File "C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\backend\__init__.py", line 95, in <module>
load_backend(get_preferred_backend())
File "C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\backend\__init__.py", line 41, in load_backend
from .._ffi.base import load_tensor_adapter # imports DGL C library
File "C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\_ffi\base.py", line 44, in <module>
_LIB, _LIB_NAME, _DIR_NAME = _load_lib()
File "C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\_ffi\base.py", line 34, in _load_lib
lib = ctypes.CDLL(lib_path[0])
File "C:\Users\s092292\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\s092292\Desktop\rcpsp\.venv\lib\site-packages\dgl\dgl.dll' (or one of its dependencies). Try using the full path with constructor syntax.
@marijnvk We are not sure about the root cause, this error usually occurs when the dependent library cannot be found. Some possible causes:
PyTorch and TensorFlow work fine when run by themselves, so I doubt it is a CUDA version incompatibility issue in those libraries. It's a bit of a hassle, but I'll try downgrading TensorFlow to 2.3 (which will require downgrading CUDA to 10 and cuDNN to 7) and see if that works.
Do you have perhaps the list of DLLs that dgl-cu111 is supposed to load directly (i.e. not through PyTorch/TensorFlow)? I have no idea if the Dependecies application picks up everything.
@marijnvk tensorflow had its own dynamic library loading system, which may prevent dgl finding the related library on windows. However I'm not sure about this. Could you try import tensorflow before import dgl?
That's much quicker to check, The result is unfortunately the same. Importing TensorFlow first and then DGL still produces the same error.
Okay, so I've figured out what is going wrong here. It actually doesn't have anything to do with TensorFlow, CUDA, cuDNN, or version mismatches at all. This is caused by changes introduced in Python 3.8
(I'm on 3.9.5
). The core of the issue is this, a change in the directories that Python considers by default when looking for DLLs. Notably, starting from this version of Python, the PATH
environment variable is no longer included by default (same goes for the current working directory, by the way). A new function is provided to add directories to the list that is searched for DLLs securely. Hacking the following into the start of the _load_lib
function of _ffi\base.py
fixed it for me:
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\bin")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\libnvvp")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\extras\\CUPTI\\lib64")
os.add_dll_directory("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.1\\include")
os.add_dll_directory("C:\\tools\\cuda\\bin") # cuDNN
It probably doesn't need all of these, but I just slapped in all related directories that were in my PATH
. I'm assuming that TensorFlow accounts for this change in Python functionality, hence the message before the error saying that cudart64_110.dll
is loaded. But when it's DGL's turn, it doesn't consider the directories in PATH, meaning it cannot find CUDA and cuDNN. Note that this is also why this was not picked up by the Dependencies application, since that one does appear to check the PATH
. This also explans the originally reported issue, which was on Python 3.8
.
Probably the clean way to do this is to loop over the directories in PATH
when DGL is first loaded, and add relevant directories with that new function one by one. Or introduce a new environment variable to set these directories (or one that informs DGL whether or not to use PATH
).
@marijnvk Thanks for your detailed investigation! We'll check how other frameworks handle this to find a better solution
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.
š Bug
FileNotFoundError: dgl.dll, even though it exists in the said directory.
To Reproduce
Steps to reproduce the behavior:
` >>>import dgl Traceback (most recent call last): File "", line 1, in
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl__init.py", line 8, in
from .backend import load_backend, backend_name
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend__init.py", line 74, in
load_backend(get_preferred_backend())
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\ init__.py", line 23, in load_backend
mod = importlib.import_module('.%s' % mod_name, name)
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\importlib__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow__init__.py", line 4, in
from .tensor import *
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\backend\tensorflow\tensor.py", line 12, in
from ... import ndarray as nd
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\ndarray.py", line 14, in
from ._ffi.object import register_object, ObjectBase
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object.py", line 8, in
from .object_generic import ObjectGeneric, convert_to_object
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\object_generic.py", line 7, in
from .base import string_types
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py", line 42, in
_LIB, _LIB_NAME = _load_lib()
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl_ffi\base.py", line 34, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\ctypes\ init.py", line 373, in init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Rohan\anaconda3\envs\tf_dgl\lib\site-packages\dgl\dgl.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Environment
pip
, inside a conda env):Additional context
After checking the directory for the missing file, it was indeed there! But the error persisted. Conda and lower versions of Python do not support TF 2.2.