ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
685 stars 94 forks source link

import tensorflow fails as missing rocsolver library #1474

Closed gggh000 closed 2 years ago

gggh000 commented 3 years ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information Ubuntu1804 Tensorfow installed from binary: pip3 install tensorflow-rocm upgrade Python 3.6.9 mi-25

Describe the current behavior root@sriov-guest:~/dev-learn/gpu/tflow/tensorflow/tflow-2nded# python3 -c 'import tensorflow as tf' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocsolver.so.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in from tensorflow.python.tools import module_util as _module_util File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 40, in from tensorflow.python.eager import context File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 35, in from tensorflow.python import pywrap_tfe File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tfe.py", line 28, in from tensorflow.python import pywrap_tensorflow File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocsolver.so.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help. root@sriov-guest:~/dev-learn/gpu/tflow/tensorflow/tflow-2nded# dpkg -l | grep rocsolver ii rocsolver 3.13.0.40300-52 amd64 AMD ROCm SOLVER library root@sriov-guest:~/dev-learn/gpu/tflow/tensorflow/tflow-2nded# find /opt/rocm-4.3.0/ -name librocsolver.so.0 /opt/rocm-4.3.0/rocsolver/lib/librocsolver.so.0 /opt/rocm-4.3.0/lib/librocsolver.so.0

Describe the expected behavior import should be ok Contributing

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

jayfurmanek commented 3 years ago

Hi @gggh000!

I think there may be two things going on here. 1) For TensorFlow 2.6, we moved to start publishing manylinux2014 compliant .whl files. Make sure your pip command is fully updated to be able to know/work with manylinux2014.

pip3 install --upgrade pip

2) ROCm 4.3.x wasn't qualified for the AI workloads until 4.3.1 and that is the version we build the TensorFlow 2.6 .whls against.

So if you update pip and update your ROCm install to 4.3.1, everything should work as expected. Please try this and report back if that helps.

Thanks!

gggh000 commented 2 years ago

Installed pip3 21.3.1, tf: tensorflow-rocm 2.6.2 and rocblas but still got following:

root@nonroot-Standard-PC-i440FX-PIIX-1996: # dpkg -l | grep -i rocblas ii rocblas 2.39.0.40300-52 amd64 rocBLAS is AMD's library for BLAS on ROCm. It is implemented in HIP and optimized for AMD GPUs. root@nonroot-Standard-PC-i440FX-PIIX-1996:# find /opt -name librocblas.so.0 /opt/rocm-4.3.0/lib/librocblas.so.0 /opt/rocm-4.3.0/rocblas/lib/librocblas.so.0

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocblas.so.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "p297.py", line 3, in import tensorflow as tf File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in from tensorflow.python.tools import module_util as _module_util File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 40, in from tensorflow.python.eager import context File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 35, in from tensorflow.python import pywrap_tfe File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tfe.py", line 28, in from tensorflow.python import pywrap_tensorflow File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 83, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: librocblas.so.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

gggh000 commented 2 years ago

after installing rom4.3.1 it appears to work.