HenestrosaDev / audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Other
165 stars 15 forks source link

[Bug] ValueError: No default align-model for language: sv #57

Open dazWiLLiE opened 2 months ago

dazWiLLiE commented 2 months ago

Steps to reproduce

Windows.

Downloaded the latest release, already have ffmpeg installed.

Transcription Language: Swedish Audio source: file (file.mkv) Transcription method: Whisper X Output filetype: srt

Clicked on "Generate transcription"

Took around an hour, then I got:

Traceback (most recent call last):
  File "handlers\whisperx_handler.py", line 53, in transcribe_file
  File "whisperx\alignment.py", line 71, in load_align_model
    raise ValueError(f"No default align-model for language: {language_code}")
ValueError: No default align-model for language: sv

An .srt file was created, and looking at the result (here are the first 11 lines):

1
00:00:23,660 --> 00:00:52,381
–Trodde du att jag hade glömt bort dig? –Risto, vad är det du gör? –Varför betalar du inte för? –Jag har inte sett nåt! Jesper! Jag betalar för att du får dubbelt så mycket jag lovar! –Jag vill inte! –Risto, gör inte det! –Titta på mig! –Titta mig i ögonen!

2
00:00:59,838 --> 00:01:01,510
För en väckbara.

3
00:03:30,452 --> 00:03:59,684
–Vad är det som har hänt? –Jag kan tyvärr inte berätta. –Jag ska besöka en vän som bor här. –Vad heter den personen? –Jakob Fivel. –Jag ska kalla på nån. Vad sa du nyligen?

It seems it does a decent job, but it cant split the dialogs correctly.

Perhaps its because there is no align model?

HenestrosaDev commented 2 months ago
  1. Does the program transcribe the entire file?
  2. As for the splitting part, it's indeed due to the lack of an aligment model for the language.
dazWiLLiE commented 2 months ago
  1. Yes, the saved file included time up to 1h26m so that should be correct.
  2. Is there an alignment model for swedish?
HenestrosaDev commented 2 months ago
  1. WhisperX doesn't have a built-in alignment model for Swedish. However, I'd have to take a look into the possibility of adding alignment models for those languages that are not supported by WhisperX, which may take a while.
HenestrosaDev commented 2 months ago

As a temporary fix, try to do the following:

  1. Open the audiotext-v2.3.0 folder.
  2. Open this file: _internal > whisperx > aligment.py
  3. Add the following line below the line "ro": "anton-l/wav2vec2-large-xlsr-53-romanian":
        "sv": "KBLab/wav2vec2-large-voxrex-swedish"

Don't forget to add a comma at the end of the "ro"... , line.

dazWiLLiE commented 2 months ago

Thank you. I'll try it right away.

dazWiLLiE commented 2 months ago

Now I got:

Traceback (most recent call last):
  File "handlers\whisperx_handler.py", line 53, in transcribe_file
  File "whisperx\alignment.py", line 71, in load_align_model
    Please find a wav2vec2.0 model finetuned on this language in https://huggingface.co/models, then pass the model name in --align_model [MODEL_NAME]")
ValueError: No default align-model for language: sv

Edit:

alignement.py

DEFAULT_ALIGN_MODELS_HF = {
    "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
    "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
    "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
    "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
    "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese",
    "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic",
    "cs": "comodoro/wav2vec2-xls-r-300m-cs-250",
    "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian",
    "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish",
    "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian",
    "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish",
    "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
    "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek",
    "tr": "mpoyraz/wav2vec2-xls-r-300m-cv7-turkish",
    "da": "saattrupdan/wav2vec2-xls-r-300m-ftspeech",
    "he": "imvladikon/wav2vec2-xls-r-300m-hebrew",
    "vi": 'nguyenvulebinh/wav2vec2-base-vi',
    "ko": "kresnik/wav2vec2-large-xlsr-korean",
    "ur": "kingabzpro/wav2vec2-large-xls-r-300m-Urdu",
    "te": "anuragshas/wav2vec2-large-xlsr-53-telugu",
    "hi": "theainerd/Wav2Vec2-large-xlsr-hindi",
    "ca": "softcatala/wav2vec2-large-xlsr-catala",
    "ml": "gvs/wav2vec2-large-xlsr-malayalam",
    "uz": "rifkat/wav2vec2-large-xls-r-300m-uz",
    "ro": "anton-l/wav2vec2-large-xlsr-53-romanian",
    "sv": "KBLab/wav2vec2-large-voxrex-swedish"
}
HenestrosaDev commented 2 months ago

Okay, it seems I'll have to take a deeper look into this. I'll keep the issue open until I find a way to solve it.

dazWiLLiE commented 2 months ago

Great, thanks!

olawalejuwonm commented 1 month ago

HI. I also got the same error for yoruba language

No default align-model for language: yo

What's the temporary fix for that?