fedirz / faster-whisper-server

https://hub.docker.com/r/fedirz/faster-whisper-server
MIT License
756 stars 109 forks source link

Automatic translation #60

Closed jojasadventure closed 2 months ago

jojasadventure commented 2 months ago

Hi, first of all, thank you so much for building this. I managed to create a little Python app to trigger from my Mac with a hotkey and replace macOS dictation for myself [https://github.com/jojasadventure/whisper-client]

Having a bit of a hard time attempting to implement a language switch as it would be awesome to be able to transcribe in different languages. Whatever I try, I get back a sort-of-translation instead of a transcription though.

I have noticed that passing a different language parameter does not seem to convince the transcribe endpoint to transcribe in that language. I tried troubleshooting that by doing a file in the Webui but this doesn't have a language selector.

I have attempted to fix this by also passing the parameter for task="transcribe", as that seems to be the suggested fix on many forums. The create() method does not support this parameter though (maybe I'm confused).

# doesn't work
                transcript = self.client.audio.transcriptions.create(
                    model="Systran/faster-distil-whisper-large-v3",
                    file=audio_file,
                    language=selected_language,
                    task="transcribe"  # Specify the task as a separate parameter
                )

I even see the server, in transcribe_file.py has some code to deal with the task parameter, but I'm too much of a noob to figure out or even convince Claude to figure out how to pass it. Or does the code below even mean it is automatically set to transcribe? In an ideal world of course the transcription endpoint should never return translations as there is a translation endpoint, so that would make sense ...

#transcribe_file.py  line 280 ... def transcribe_file()

segments, transcription_info = whisper.transcribe(
        file.file,
        task=Task.TRANSCRIBE,
        language=language,
        initial_prompt=prompt,
        word_timestamps="word" in timestamp_granularities,
        temperature=temperature,
        vad_filter=True,
        hotwords=hotwords,
    )

Would you be willing to point me in the right direction, or at least confirm if that's even implemented / something that should work in principle? Thank you!

jojasadventure commented 2 months ago

Update: After further testing, I've discovered that the language selection issue appears to be model-dependent. I found that the Systran/faster-distil-whisper-large-v3 model consistently produces English transcriptions regardless of the language parameter. However using the Systran/faster-whisper-medium model with language=de, the API correctly transcribes German audio. Haven't tried other models yet.

tl;dr it seems the language selection functionality is implemented on the server side but may not be working as expected with all models.

jojasadventure commented 2 months ago

my experimental whisper GUI client in python, incl. language toggle & including custom whisper prompt ( :

fedirz commented 2 months ago

Update: After further testing, I've discovered that the language selection issue appears to be model-dependent. I found that the Systran/faster-distil-whisper-large-v3 model consistently produces English transcriptions regardless of the language parameter. However using the Systran/faster-whisper-medium model with language=de, the API correctly transcribes German audio. Haven't tried other models yet.

tl;dr it seems the language selection functionality is implemented on the server side but may not be working as expected with all models.

Yeah, distil models only support English.

From the README.md

...
language:
  - en
...