CUDA failed with error out of memory

Heya,

Just tried the container, but getting the following log file issue:

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[2023-11-29 20:49:44 +0000] [33] [INFO] Starting gunicorn 21.2.0
[2023-11-29 20:49:44 +0000] [33] [INFO] Listening at: http://0.0.0.0:9000 (33)
[2023-11-29 20:49:44 +0000] [33] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2023-11-29 20:49:44 +0000] [34] [INFO] Booting worker with pid: 34
[2023-11-29 20:49:57 +0000] [34] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
    super(UvicornWorker, self).init_process()
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 371, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/app/webservice.py", line 16, in <module>
    from .faster_whisper.core import transcribe, language_detection
  File "/app/app/faster_whisper/core.py", line 25, in <module>
    model = WhisperModel(
  File "/app/.venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 130, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error out of memory
[2023-11-29 20:49:57 +0000] [34] [INFO] Worker exiting (pid: 34)
[2023-11-29 20:49:57 +0000] [33] [ERROR] Worker (pid:34) exited with code 3
[2023-11-29 20:49:57 +0000] [33] [ERROR] Shutting down: Master
[2023-11-29 20:49:57 +0000] [33] [ERROR] Reason: Worker failed to boot.

Compose setup:

  whisperasr:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    container_name: Whisperrr
    environment:
      - ASR_MODEL=large
      - ASR_ENGINE=faster_whisper
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
    volumes:
      - /mnt/docker_data/whisperrr:/root/.cache/whisper
    runtime: nvidia
    devices:
      - /dev/dri:/dev/dri
    ports:
      - 34303:9000
    restart: always
    depends_on:
      - bazarr

GPU: Nvidia P2000.

Is this due to the GPU not having enough memory to handle the model size?

ahmetoner / whisper-asr-webservice

CUDA failed with error out of memory #173