Purfview / whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
1.36k stars 66 forks source link

Whisper-turbo model support #305

Closed codeMonkey-shin closed 3 weeks ago

codeMonkey-shin commented 2 months ago

https://github.com/openai/whisper/pull/2361/files

Will this model be supported in the future?

stevevaius2015 commented 1 month ago

Definitely we need this.

andyhoeung commented 1 month ago

I know large-v3 causes some weird stuff in transcription compared to large-v2, I wonder if they improved upon that while also making it faster.

wangfeng35 commented 1 month ago

+1 need turbo

mp3pintyo commented 1 month ago

+1 need turbo

AlanHuang99 commented 1 month ago

+1 need turbo

LoggeL commented 1 month ago

Downloaded the files for turbo manually and replacing an existing model seems to work as a workaround

https://huggingface.co/openai/whisper-large-v3-turbo/tree/main

andyhoeung commented 1 month ago

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

juntaosun commented 1 month ago

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

--model=large-v3-turbo

` Warning: 'large-v3' model may produce inferior results, try 'large-v2'!

Traceback (most recent call last): File "D:\whisper-fast_XXL__main.py", line 1668, in File "D:\whisper-fast_XXL\main.py", line 1595, in cli File "faster_whisper\transcribe.py", line 1456, in restore_speech_timestamps File "faster_whisper\transcribe.py", line 798, in generate_segments File "faster_whisper\transcribe.py", line 1109, in encode ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead [15868] Failed to execute script 'main__' due to unhandled exception! `

faster-whisper-xxl.exe

ZeVince commented 3 weeks ago

I didn't have to replace any model. Just downloaded any faster-whisper large-v3 turbo variant, for example this one https://huggingface.co/Infomaniak-AI/faster-whisper-large-v3-turbo, created a folder in _models called 'faster-whisper-large-v3-turbo', and used '--model=large-v3-turbo'.

--model=large-v3-turbo

` Warning: 'large-v3' model may produce inferior results, try 'large-v2'!

Traceback (most recent call last): File "D:\whisper-fast_XXLmain.py", line 1668, in File "D:\whisper-fast_XXLmain.py", line 1595, in cli File "faster_whisper\transcribe.py", line 1456, in restore_speech_timestamps File "faster_whisper\transcribe.py", line 798, in generate_segments File "faster_whisper\transcribe.py", line 1109, in encode ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead [15868] Failed to execute script 'main' due to unhandled exception! `

faster-whisper-xxl.exe

Same error here... any fix since then ?

Purfview commented 3 weeks ago

Same error here... any fix since then ?

Not yet, but will be sooner than later, just that now some other things have priority.

Purfview commented 3 weeks ago

Will this model be supported in the future?

It was always supported, as any other custom finetuned model.

Autodownload for it is added in v193.1

nebehr commented 2 weeks ago

Is turbo model supposed to do any translation at all? Produces untranslated German text with --task translate, whereas vanilla large-v3 appears to work fine.

Purfview commented 2 weeks ago

"Whisper turbo was fine-tuned for two more epochs over the same amount of multilingual transcription data used for training large-v3, i.e. excluding translation data, on which we don’t expect turbo to perform well."

nebehr commented 2 weeks ago

I see. From this description though I would expect it to be bad translation, not no translation at all. Anyway, this is beyond the scope of this project.