Error when transcribing

friki67 commented 1 month ago

Hello! I've just discovered this project. I have it installed using docker compose and the precompiled docker image. When I upload an audio (m4a), after a few seconds of processing, I'm getting an "Error" message in the Output place.

I'm sure my NVIDIA hw and sw are working because nvdia-smi has a consistent output and I have another docker container using the GPU. I've tried changing the compose file gpu part from this:


    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu

to this

    environment:
      - NVIDIA_VISIBLE_DEVICES=ALL
    runtime: nvidia

just because the other container (Unmanic) is working with this conf. But I'm getting same "error" message.

Where can I get a log or some clue to diagnose this?

EDIT: I tried the other install way, using pip (installed the git version). The error I'm getting is about VRAM, I think it could be the main problem. I have 8GB VRAM. If I try cpu inference it works (and takes a forever to make a transcription). If I try small model it works. When using medium sometimes it works, and sometime it give me an error about cuda memory. I cant use large_v3 model because I'm always getting the "not enough cuda memory" error.

I cannot use whisperX because "your gpu is not able to use fp16" or something allike (old GTX 1070 here).

EDIT2: I've tried with other similar project https://github.com/pluja/whishper and the large models (v2 y v3) are working in my card (they have no speaker diarization by now). JFYI

JSchmie commented 1 month ago

Hi,

Thank you for reporting this issue. Could you please provide more details about your setup? Are you using a reverse proxy? This could be the cause of the problem, as reverse proxies often have a default package size limit that may need to be manually increased.

Additionally, are there any error messages when you check the Docker logs? These logs should display the app's output.

friki67 commented 1 month ago

Hello I think we have crossed posts. I edited my first post.

I'm using nginx as a proxy, but it works ok with my pip install.

JSchmie commented 1 month ago

So, the container works when you use smaller models? The Whisper large-v* models require around 12GB of VRAM, and you also need to account for the Pyannote model, which, while not large, is still not negligible.

The medium model version of Whisper and WhisperX should work well for you, offering a good balance between hardware usage and results.

If you'd like to use the large model but want to avoid long processing times, you could set up the asynchronous WebUI. To do this, you'll need to connect to an email client of your choice. Once the transcript is finished, you would receive an email with the completed transcript.

friki67 commented 1 month ago

So, the container works when you use smaller models? The Whisper large-v* models require around 12GB of VRAM, and you also need to account for the Pyannote model, which, while not large, is still not negligible.

The medium model version of Whisper and WhisperX should work well for you, offering a good balance between hardware usage and results.

If you'd like to use the large model but want to avoid long processing times, you could set up the asynchronous WebUI. To do this, you'll need to connect to an email client of your choice. Once the transcript is finished, you would receive an email with the completed transcript.

Thank you very much. Great project!

JSchmie commented 1 month ago

Thanks, I will mark this issue as closed if you experience other issues, please feel free to open another issue. :)

JSchmie / ScrAIbe-WebUI

Error when transcribing #27