RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

version: "3.8"
services:

  whisperfusion:
    image: ghcr.io/collabora/whisperfusion:latest
    shm_size: 64G
    expose:
     - 6006/tcp
     - 8888/tcp
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]

$ docker compose run --entrypoint "nvidia-smi -L" --rm whisperfusion
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-e16d1e1c-7902-1bcb-fd46-e437e472b976)

$ docker compose up whisperfusion
[+] Running 1/0
 ✔ Container whisperfusion-whisperfusion-1  Created                                                                                           0.0s
Attaching to whisperfusion-whisperfusion-1
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | ==========
whisperfusion-whisperfusion-1  | == CUDA ==
whisperfusion-whisperfusion-1  | ==========
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | CUDA Version 12.2.2
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
whisperfusion-whisperfusion-1  | By pulling and using the container, you accept the terms and conditions of this license:
whisperfusion-whisperfusion-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | done loading
whisperfusion-whisperfusion-1  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
whisperfusion-whisperfusion-1  | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
whisperfusion-whisperfusion-1  | Process Process-3:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
whisperfusion-whisperfusion-1  |     self.run()
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
whisperfusion-whisperfusion-1  |     self._target(*self._args, **self._kwargs)
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/llm_service.py", line 195, in run
whisperfusion-whisperfusion-1  |     self.initialize_model(
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/llm_service.py", line 109, in initialize_model
whisperfusion-whisperfusion-1  |     self.runner = self.runner_cls.from_dir(**self.runner_kwargs)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 416, in from_dir
whisperfusion-whisperfusion-1  |     torch.cuda.set_device(rank % runtime_mapping.gpus_per_node)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
whisperfusion-whisperfusion-1  |     torch._C._cuda_setDevice(device)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  | /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
whisperfusion-whisperfusion-1  |   return torch._C._cuda_getDeviceCount() > 0
whisperfusion-whisperfusion-1  | /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
whisperfusion-whisperfusion-1  |   warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
whisperfusion-whisperfusion-1  | Process Process-4:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
whisperfusion-whisperfusion-1  |     self.run()
whisperfusion-whisperfusion-1  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
whisperfusion-whisperfusion-1  |     self._target(*self._args, **self._kwargs)
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/tts_service.py", line 19, in run
whisperfusion-whisperfusion-1  |     self.initialize_model()
whisperfusion-whisperfusion-1  |   File "/root/WhisperFusion/tts_service.py", line 14, in initialize_model
whisperfusion-whisperfusion-1  |     self.pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 61, in __init__
whisperfusion-whisperfusion-1  |     self.vocoder = Vocoder()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/a2wav.py", line 14, in __init__
whisperfusion-whisperfusion-1  |     self.vocos = Vocos.from_pretrained(repo_id).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in cuda
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   [Previous line repeated 4 more times]
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
whisperfusion-whisperfusion-1  |     param_applied = fn(param)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in <lambda>
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  | Failed to load the T2S model:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 48, in __init__
whisperfusion-whisperfusion-1  |     self.t2s = TSARTransformer.load_model(**args).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in cuda
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 810, in _apply
whisperfusion-whisperfusion-1  |     module._apply(fn)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 833, in _apply
whisperfusion-whisperfusion-1  |     param_applied = fn(param)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 918, in <lambda>
whisperfusion-whisperfusion-1  |     return self._apply(lambda t: t.cuda(device))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
whisperfusion-whisperfusion-1  |     torch._C._cuda_init()
whisperfusion-whisperfusion-1  | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
whisperfusion-whisperfusion-1  |
whisperfusion-whisperfusion-1  | Failed to load the S2A model:
whisperfusion-whisperfusion-1  | Traceback (most recent call last):
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/pipeline.py", line 56, in __init__
whisperfusion-whisperfusion-1  |     self.s2a = SADelARTransformer.load_model(**args).cuda()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/whisperspeech/s2a_delar_mup_wds_mlang.py", line 423, in load_model
whisperfusion-whisperfusion-1  |     spec = torch.load(local_filename)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1014, in load
whisperfusion-whisperfusion-1  |     return _load(opened_zipfile,
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1422, in _load
whisperfusion-whisperfusion-1  |     result = unpickler.load()
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1392, in persistent_load
whisperfusion-whisperfusion-1  |     typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1366, in load_tensor
whisperfusion-whisperfusion-1  |     wrap_storage=restore_location(storage, location),
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 381, in default_restore_location
whisperfusion-whisperfusion-1  |     result = fn(storage, location)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 274, in _cuda_deserialize
whisperfusion-whisperfusion-1  |     device = validate_cuda_device(location)
whisperfusion-whisperfusion-1  |   File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 258, in validate_cuda_device
whisperfusion-whisperfusion-1  |     raise RuntimeError('Attempting to deserialize object on a CUDA '
whisperfusion-whisperfusion-1  | RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
whisperfusion-whisperfusion-1  |

collabora / WhisperFusion

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW #18