csyben / PYRO-NN

Python Reconstruction Operators in Neural Networks. High level python API for PYRO-NN-Layers
Apache License 2.0
108 stars 34 forks source link

Install issue Ubuntu 18 #19

Open maxrohleder opened 3 years ago

maxrohleder commented 3 years ago

When installing pyronn like so:

conda env create -n pyronn python=3.6
pip install pyronn

On my system:

I tested the installation:

python -c "import tensorflow as tf;tf.config.list_physical_devices('GPU');import pyronn"

I get the following log/error:

2021-05-14 10:38:18.564014: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-14 10:38:19.246305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-14 10:38:19.291068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 10:38:19.292126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2021-05-14 10:38:19.292156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-14 10:38:19.299451: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-14 10:38:19.299496: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-14 10:38:19.309172: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-14 10:38:19.311381: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-14 10:38:19.318912: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-05-14 10:38:19.321349: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-14 10:38:19.322283: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-14 10:38:19.322369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 10:38:19.323285: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-14 10:38:19.324363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/dl/miniconda3/envs/pyronn/lib/python3.6/site-packages/pyronn/__init__.py", line 16, in <module>
    import pyronn_layers
  File "/home/dl/miniconda3/envs/pyronn/lib/python3.6/site-packages/pyronn_layers/__init__.py", line 21, in <module>
    from pyronn_layers.python.ops.pyronn_layers_ops import pyronn_layers
  File "/home/dl/miniconda3/envs/pyronn/lib/python3.6/site-packages/pyronn_layers/python/ops/pyronn_layers_ops.py", line 27, in <module>
    pyronn_layers_ops = load_library.load_op_library(resource_loader.get_path_to_datafile('_pyronn_layers_ops.so'))
  File "/home/dl/miniconda3/envs/pyronn/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/dl/miniconda3/envs/pyronn/lib/python3.6/site-packages/pyronn_layers/python/ops/_pyronn_layers_ops.so: undefined symbol: _ZN10tensorflow8OpKernel11TraceStringEPNS_15OpKernelContextEb

First I had cuda 11.1 installed, but then I got this error:

tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.10.1: cannot open shared object file: No such file or directory

, which made me assume, that cuda 10.1 is needed by the pyronn-layers.

I suspect, that the layers were built against a specific tf version. This thread suggests that the error code implies this. Which version were the layers build against? (Maybe I am missing something) Could this be taken care of by the setup wheel?

maxrohleder commented 3 years ago

I managed to get it to work by manually downgrading to tensorflow 2.3:

pip install tensorflow==2.3

Maybe this could be integrated into the dependency management of pip?

csyben commented 3 years ago

The pyronn package need to be build against a specific Tensorflow version. The current packages in the pip repository are built against TF 2.3 with CUDA 10 . Currently, there are some problems with the bazel configuration preventing a version for Tensorflow 2.4 with Cuda 11 .

I agree, that this should be inetragted in the dependency management, or at least the supported and working version combinations should be statet in this repository for clarification.

maxrohleder commented 3 years ago

@mareikethies While we are at it: Maybe the windows-binary can be made available via pip as well?

I made my build available here: https://github.com/maxrohleder/win-pyronn

cocoakang commented 2 years ago

Hi, I encountered the same problem... My GPU on the server is 3090, which strictly requires CUDA11. Is there any way to fix this CUDA version problem on Linux platform? @csyben @maxrohleder Many thanks!

maxrohleder commented 2 years ago

@cocoakang You would have to recompile the layers with your desired cuda version. In theory that's not a problem if you feel comfortable with the tf build pipeline.

iamNCJ commented 2 years ago

The pyronn package need to be build against a specific Tensorflow version. The current packages in the pip repository are built against TF 2.3 with CUDA 10 . Currently, there are some problems with the bazel configuration preventing a version for Tensorflow 2.4 with Cuda 11 .

I agree, that this should be inetragted in the dependency management, or at least the supported and working version combinations should be statet in this repository for clarification.

I've built a CUDA11 version against TF 2.5, it currently supports only py3.6 (I'm still working on newer versions), but the patches needed by CUDA 11 should be almost the same.

https://github.com/iamNCJ/pyronn-layers