libdevice.10.bc not found for cuda 12.0 resulting in failed compilations

hmaarrfk commented 8 months ago

Solution to issue cannot be found in the documentation.

[X] I checked the documentation.

Issue

I guess I have a RTX A6000 and that might be the source of some of the reproduction challenges for others.

[x] RTX A6000 -- bug happens
[x] RTX 3060 -- bug happens
[x] RTX 3090 -- bug happens
[ ] RTX 4090 -- not confirmed -- actually i lost access to mine

Tensorflow seens to yield me:

2024-01-27 17:49:22.549835: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-01-27 17:49:22.978682: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8800
2024-01-27 17:49:25.223414: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:504] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /home/conda/feedstock_root/build_artifacts/tensorflow-split_1705364024140/_build_env/targets/x86_64-linux
  /usr/local/cuda
  /home/mark/miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /home/mark/miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .

this ends up causing my model compilation to outright fail.

I'll try to reproduce with other GPUs too.

``` self = , fn = ._run_fn at 0x7f35b0181240> args = ({: array([[[[0., 0., 0.], [0...], dtype=float32)}, [], [], None, None) message = "Graph execution error:\n\nDetected at node 'Sigmoid' defined at (most recent call last):\nNode: 'Sigmoid'\nDetected a...\n (1) UNKNOWN: JIT compilation failed.\n\t [[{{node Sigmoid}}]]\n0 successful operations.\n0 derived errors ignored." m = def _do_call(self, fn, *args): try: > return fn(*args) ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1402: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1385: in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , options = None feed_dict = {: array([[[[0., 0., 0.], [0.... [0., 0., 0.], ..., [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]]], dtype=float32)} fetch_list = [], target_list = [], run_metadata = None def _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata): > return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, fetch_list, target_list, run_metadata) E tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. E (0) UNKNOWN: JIT compilation failed. E [[{{node Sigmoid}}]] E [[ArgMax/_15]] E (1) UNKNOWN: JIT compilation failed. E [[{{node Sigmoid}}]] E 0 successful operations. E 0 derived errors ignored. ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1478: UnknownError During handling of the above exception, another exception occurred: tmp_path = PosixPath('/tmp/pytest-of-mark/pytest-17/test_tracking_data_no_dlc0') def test_tracking_data_no_dlc(tmp_path): pytest.importorskip('tensorflow') from owl.analysis.tracking import infer_dataset from ..mcam_data._registry import fetch_tar dataset_filename = fetch_tar( "20230102_PTZ_TL_Timepoint1_3DPF_video_20230102_165559_515_metadata_nc.tar.xz" ) model_name = "zebrafish_96_well_plate_model_20230405.zip" model_path = fetch_model(model_name) frame_slice = slice(1400, 1600) dataset = mcam_data.load(dataset_filename, delayed=True).isel({ 'frame_number': frame_slice, }) > dataset = infer_dataset(dataset, model_path) tests/analysis/test_tracking_data_analysis.py:483: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ owl/analysis/tracking.py:187: in infer_dataset InferImageTracks( owl/analysis/tracking.py:315: in InferImageTracks pose = sess.run(pose_tensor, feed_dict={inputs: frames}) ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:972: in run result = self._run(None, fetches, feed_dict, options_ptr, ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1215: in _run results = self._do_run(handle, final_targets, final_fetches, ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1395: in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , fn = ._run_fn at 0x7f35b0181240> args = ({: array([[[[0., 0., 0.], [0...], dtype=float32)}, [], [], None, None) message = "Graph execution error:\n\nDetected at node 'Sigmoid' defined at (most recent call last):\nNode: 'Sigmoid'\nDetected a...\n (1) UNKNOWN: JIT compilation failed.\n\t [[{{node Sigmoid}}]]\n0 successful operations.\n0 derived errors ignored." m = def _do_call(self, fn, *args): try: return fn(*args) except errors.OpError as e: message = compat.as_text(e.message) m = BaseSession._NODEDEF_NAME_RE.search(message) node_def = None op = None if m is not None: node_name = m.group(3) try: op = self._graph.get_operation_by_name(node_name) node_def = op.node_def except KeyError: pass message = error_interpolation.interpolate_graph(message, self._graph) if 'only supports NHWC tensor format' in message: message += ('\nA possible workaround: Try disabling Grappler optimizer' '\nby modifying the config for creating the session eg.' '\nsession_config.graph_options.rewrite_options.' 'disable_meta_optimizer = True') > raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter E tensorflow.python.framework.errors_impl.UnknownError: Graph execution error: E E Detected at node 'Sigmoid' defined at (most recent call last): E Node: 'Sigmoid' E Detected at node 'Sigmoid' defined at (most recent call last): E Node: 'Sigmoid' E 2 root error(s) found. E (0) UNKNOWN: JIT compilation failed. E [[{{node Sigmoid}}]] E [[ArgMax/_15]] E (1) UNKNOWN: JIT compilation failed. E [[{{node Sigmoid}}]] E 0 successful operations. E 0 derived errors ignored. E E Original stack trace for 'Sigmoid': ../../miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/client/session.py:1421: UnknownError ------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------ 2024-01-27 17:49:18.609862: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-01-27 17:49:18.609928: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-01-27 17:49:18.610964: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-01-27 17:49:18.617564: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-01-27 17:49:21.690570: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.719628: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.719942: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.808625: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.808944: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.809171: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.898936: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.899251: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.899483: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2024-01-27 17:49:21.899640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4096 MB memory: -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:42:00.0, compute capability: 8.6 2024-01-27 17:49:22.549835: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled 2024-01-27 17:49:22.978682: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8800 2024-01-27 17:49:25.223414: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:504] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. ./cuda_sdk_lib /home/conda/feedstock_root/build_artifacts/tensorflow-split_1705364024140/_build_env/targets/x86_64-linux /usr/local/cuda /home/mark/miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc /home/mark/miniforge3/envs/dev/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc . You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_d ir=/path/to/cuda will work. 2024-01-27 17:49:25.223467: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:542] libdevice is required by this HLO module but was not found at ./libdevic e.10.bc error: libdevice not found at ./libdevice.10.bc 2024-01-27 17:49:25.223760: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:207] INTERNAL: Generating device code failed. 2024-01-27 17:49:25.224489: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: JIT compilation failed. 2024-01-27 17:49:25.224531: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 5539625702534005013 2024-01-27 17:49:25.224544: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 15289050881627197742 2024-01-27 17:49:25.224554: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 3266534962090248600 2024-01-27 17:49:25.224682: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 8488635238871138060 ```

Installed packages

$ mamba list | grep -E "(tensorflow|cuda)"
cuda-cudart               12.0.107             hd3aeb46_8    conda-forge
cuda-cudart_linux-64      12.0.107             h59595ed_8    conda-forge
cuda-nvcc-tools           12.0.76              h59595ed_1    conda-forge
cuda-nvrtc                12.0.76              hd3aeb46_2    conda-forge
cuda-nvtx                 12.0.76              h59595ed_1    conda-forge
cuda-version              12.0                 hffde075_2    conda-forge
libopenvino-tensorflow-frontend 2023.2.0             h76f315d_4    conda-forge
libopenvino-tensorflow-lite-frontend 2023.2.0             h59595ed_4    conda-forge
tensorflow                2.15.0          cuda120py310h9360858_2    conda-forge
tensorflow-base           2.15.0          cuda120py310heceb7ac_2    conda-forge
tensorflow-estimator      2.15.0          cuda120py310h549c77d_2    conda-forge

Environment info

mamba version : 1.5.6
     active environment : dev
    active env location : /home/mark/miniforge3/envs/dev
            shell level : 1
       user config file : /home/mark/.condarc
 populated config files : /home/mark/miniforge3/.condarc
                          /home/mark/.condarc
          conda version : 23.11.0
    conda-build version : 3.28.4
         python version : 3.10.13.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen
                          __conda=23.11.0=0
                          __cuda=12.2=0
                          __glibc=2.35=0
                          __linux=6.5.0=0
                          __unix=0=0
       base environment : /home/mark/miniforge3  (writable)
      conda av data dir : /home/mark/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/mark/miniforge3/pkgs
                          /home/mark/.conda/pkgs
       envs directories : /home/mark/miniforge3/envs
                          /home/mark/.conda/envs
               platform : linux-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.10.13 Linux/6.5.0-15-generic ubuntu/22.04.3 glibc/2.35 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.6
                UID:GID : 1003:1003
             netrc file : None
           offline mode : False

njzjz commented 8 months ago

It searched libdevice from $BUILD_PREFIX, which we set $CUDA_HOME to:

https://github.com/conda-forge/tensorflow-feedstock/blob/f152ce011ff68869ab1999419fb51963cd213200/recipe/build.sh#L108-L109

detect_binary_files_with_prefix is set to False in #157 (why?), so $BUILD_PREFIX cannot be replaced...

https://github.com/conda-forge/tensorflow-feedstock/blob/f152ce011ff68869ab1999419fb51963cd213200/recipe/meta.yaml#L463

hmaarrfk commented 8 months ago

I don't think BUILD_PREFIX gets replaced in general.

conda-forge / tensorflow-feedstock