Transcription Issue: Chinese Audio Being Translated to English

Gan-Xing commented 1 week ago

Description

I am using fedirz/faster-whisper-server for transcribing Chinese audio files, but the output is incorrectly translated to English. I only need transcription, not translation.

Environment

Docker Image: fedirz/faster-whisper-server:latest-cuda
Host OS: Ubuntu 22.04

Docker Command:

docker run -d \
--name whisper \
--gpus all \
--publish 8000:8000 \
--volume ~/.cache/huggingface:/root/.cache/huggingface \
--env HTTP_PROXY=http://172.16.2.68:7890 \
--env HTTPS_PROXY=http://172.16.2.68:7890 \
--restart always \
fedirz/faster-whisper-server:latest-cuda

Logs

INFO:     Started server process [1]
INFO:     Waiting for application startup.
2024-06-25 04:06:54,626:INFO:faster_whisper_server.logger:load_model:Loaded Systran/faster-distil-whisper-large-v3 loaded in 118.86 seconds. auto(default) will be used for inference.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024-06-25 04:35:15,243:INFO:faster_whisper_server.logger:transcribe_file:Transcribed 9.664(8.4) seconds of audio in 0.88 seconds
INFO:     172.17.0.1:44736 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK

Current Behavior

The transcription output is being translated to English, despite the audio being in Chinese. Here is the curl command and the output:

curl -X POST "http://localhost:8000/v1/audio/transcriptions" \
     -F "file=@/mnt/raid1/backup/test.mp3" \
     -F "model=Systran/faster-distil-whisper-large-v3" \
     -F "language=zh" \
     -F "response_format=json" \
     -F "temperature=0"

{"text":"This is a videoing testes used to use using using a new-yin-trans-women-ssy to see."}

Expected Behavior

The transcription output should be in Chinese, as the input audio is in Chinese, and I have specified the language as zh.

Steps to Reproduce

Run the Docker container using the provided command.
Use the above curl command to send a Chinese audio file (test.mp3) for transcription.
Observe the incorrect output in English.

Additional Context

I am using a Chinese audio file that I recorded myself. I only need transcription, not translation.

Request

How can I ensure that the server only transcribes the audio and does not translate it? Is there any additional configuration or parameter that I need to set?

Thank you for your assistance! test.mp3.zip

Gan-Xing commented 1 week ago

I have also tested with French audio, and the issue persists. Here is the curl command and the result:

curl -X POST "http://localhost:8000/v1/audio/transcriptions" \
     -F "file=@/mnt/raid1/backup/boncourage.mp3" \
     -F "model=Systran/faster-distil-whisper-large-v3" \
     -F "language=fr"

The response:

{"text":"You've got the bantraille. All, my frere."}

The French audio is also being translated to English instead of being transcribed. boncourage.mp3.zip

Gan-Xing commented 1 week ago

I found a solution to the problem. The model Systran/faster-distil-whisper-large-v3 does not support Chinese and French, it only supports English. Here is the successful transcription using a model that supports Chinese:

curl -X POST "http://172.16.2.68:8000/v1/audio/transcriptions" \
     -F "file=@test.mp3" \
     -F "model=Systran/faster-whisper-large-v2" \
     -F "language=zh" \
     -F "response_format=json" \
     -F "temperature=0"

The response:

{"text":"这是一段录音测试用来进行语音转文字的测试"}

You can find available models that support different languages at Systran on Hugging Face.

fedirz commented 1 week ago

Thanks for such a detailed issue. Like you had already discovered, distil models only support English.

It looks like the supported languages can be found in the README.md of the models. What I'll end up doing here is adding a check on the transcription route that ensures that the model supports the requested language, if it doesn't 4xx will be returned. This will provide an immediate feedback to users, letting them know if what they are trying to do is not possible.

Again, greatly appreciate you putting the time to create and follow up on the issue. If you have any feature requests please LMK

Gan-Xing commented 1 week ago

来信已收到。谢谢。——此为自动回复。Votre courrier est bien re?0?4u,merci.//C'est une réponse automatique.Your e-mail has been received,thanks.//This is an automatic reply.

fedirz / faster-whisper-server