Use multiple GPUs to process queue

I am trying to use both of my GPUs who are passed through to my docker container.

services: faster-whisper-server-cuda: image: fedirz/faster-whisper-server:latest-cuda build: dockerfile: Dockerfile.cuda context: . platforms: - linux/amd64 - linux/arm64 restart: unless-stopped ports: - 8162:8000 environment: - WHISPER__MODEL=deepdml/faster-whisper-large-v3-turbo-ct2 - WHISPER__INFERENCE_DEVICE=cuda - WHISPER__COMPUTE_TYPE=int8 - WHISPER__NUM_WORKERS=4 - WHISPER__CPU_THREADS=4 - WHISPER_DEVICE=cuda - DEFAULT_LANGUAGE=en - PRELOAD_MODELS=["deepdml/faster-whisper-large-v3-turbo-ct2"] volumes: - hugging_face_cache:/root/.cache/huggingface privileged: true deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: hugging_face_cache: I tried everything but it won't use more than 1 GPU even if:

OpenNMT / CTranslate2

Use multiple GPUs to process queue #1816