fedirz / faster-whisper-server

https://hub.docker.com/r/fedirz/faster-whisper-server
MIT License
743 stars 107 forks source link

PRELOAD_MODELS doesn't work on last Docker image tag (but with build locally, it works) #77

Open leoguillaume opened 2 months ago

leoguillaume commented 2 months ago

With local deployment, the PRELOAD_MODELS config variable works perfectly :

PRELOAD_MODELS='["Systran/faster-whisper-medium.en", "Systran/faster-whisper-small.en"]' MAX_MODELS=2 uvicorn main:app --port 8080 --log-level debug --reload

image

But in a docker compose that not : image

The docker compose :

services:
  faster-whisper-server-cuda:
    image: fedirz/faster-whisper-server:latest-cuda
    volumes:
      - /data/models/test:/root/.cache/huggingface
    restart: unless-stopped
    ports:
      - 8000:8000
    environment:
      - LOG_LEVEL=debug
      - ENABLE_UI=False
      - MAX_MODELS=2
      - PRELOAD_MODELS='["Systran/faster-whisper-medium.en", "Systran/faster-whisper-small.en"]'
    develop:
      watch:
        - path: faster_whisper_server
          action: rebuild
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

I tried different types of quotes :

The models are not download on my volume or anywhere else. Any ideas ? Thanks in advance

thiswillbeyourgithub commented 2 months ago

You're mistaken, you're not inputing the modelname as Systran is just the name of the repo of the guy how makes faster whisper.

I have this in my docker compose: - PRELOAD_MODELS=["large-v3"]

thiswillbeyourgithub commented 2 months ago

It's probably worth adding an example in the yaml file

leoguillaume commented 2 months ago

I've just rebuilt the image directly from the repository, it works perfectly, there must be a difference between the main branch and the cuda-latest tag.

For example with [“large-v3”] and with the locally built fresh image : image The error is classic huggingface since large-v3 is not a known model id on HF.

With the same image but with ["Systran/faster-whisper-large-v3", "Systran/faster-distil-whisper-large-v3"]:

image

That's work :) can you push a image with de latest code version maybe ?

thiswillbeyourgithub commented 2 months ago

I'm not the owner of this repo so I'll leave that up to them :)

willy-r commented 2 months ago

I'm experiencing the same issue. Have you been able to find a solution for it?

gsoul commented 2 months ago
    environment:
      - PRELOAD_MODELS=["Systran/faster-whisper-medium"]

works for me

leoguillaume commented 2 months ago

    environment:

      - PRELOAD_MODELS=["Systran/faster-whisper-medium"]

works for me

Which image tag you use ?

gsoul commented 2 months ago
services:
  faster-whisper-server-cuda:
    image: fedirz/faster-whisper-server:latest-cuda
    build:
      dockerfile: Dockerfile.cuda
      context: .
      platforms:
        - linux/amd64
        - linux/arm64
    restart: unless-stopped
    ports:
      - 8000:8000
    environment:
      - PRELOAD_MODELS=["Systran/faster-whisper-medium"]
    volumes:
      - hugging_face_cache:/root/.cache/huggingface
    develop:
      watch:
        - path: faster_whisper_server
          action: rebuild
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: ["gpu"]

volumes:
  hugging_face_cache:
thiswillbeyourgithub commented 1 month ago

Well I'm definitely encountering this issue now. It happened when I switched to large v3 but might have nothing to do with that since reusing my previous config does not seem to preload either.

So it seems to have broke recently.

Here's my compose content where I added comments.

      faster-whisper-server-cuda:
        image: fedirz/faster-whisper-server:latest-cuda
        build:
          dockerfile: Dockerfile.cuda
          context: .
          platforms:
            - linux/amd64
        volumes:
          - /home/root/.cache/huggingface:/root/.cache/huggingface
        restart: unless-stopped
        ports:
          - 8001:8001
        environment:
          - UVICORN_PORT=8001
          - ENABLE_UI=false
          - MIN_DURATION=1
          # default TTL is 300 (5min), -1 to disable, 0 to unload directly, 43200=12h
          - WHISPER__TTL=43200
          - WHISPER__INFERENCE_DEVICE=cuda
          - WHISPER__COMPUTE_TYPE=int8

          - WHISPER__MODEL=deepdml/faster-whisper-large-v3-turbo-ct2  # works (finds the right model)
          - PRELOAD_MODELS=["deepdml/faster-whisper-large-v3-turbo-ct2"]  # doesn't work (no preloading)
          # - PRELOAD_MODELS=["faster-whisper-large-v3-turbo-ct2"]  # doesn't work either
          # Used to work but not anymore
          # - WHISPER__MODEL=large-v3
          # - PRELOAD_MODELS=["large-v3"]
            develop:
          watch:
            - path: faster_whisper_server
              action: rebuild
        deploy:
          resources:
            reservations:
              devices:
                - capabilities: ["gpu"]
        network_mode: host
        pull_policy: always
thiswillbeyourgithub commented 1 month ago

(Very sorry for bothering you @fedirz but because this issue was closed in the past I'm afraid you would miss it when catching up so I'm humbly notifying you and asking to reopen this issue just in case, but of course do what you want and keep it closed if that's how you work :))

theodufort commented 5 days ago
          - WHISPER__INFERENCE_DEVICE=cuda

Same for me, preloading models doesnt work which is not that big of a deal but still would make the transcribing faster...