Request for help: how to make Whisper results *worse*

Krzysiu commented 4 days ago

Hello! I couldn't find any support group, so I'm writing here, maybe you'll be able to help me. If I commit some crime against project, please delete it or inform me so I'll slap myself.

How to make Whisper output as much text as possible (i.e. increase wrong recognition)? Yes, I'm aware it may output a lot of nonsense, but I'm got an idea for a little project which bases on misheard words. So far it works and it's fun, but sometimes it fails.

What I need is set of the best parameters that would increase silliness of results. I got only CPU for that, so I can't fiddle much hoping for a good result. Is higher temperature making Whisper less precise? I tried with "1" and it seems like it increased nonsense, but over 1 it's

Traceback (most recent call last):
  File "D:\whisper-fast\_XXL\__main__.py", line 1668, in <module>
  File "D:\whisper-fast\_XXL\__main__.py", line 1595, in cli
  File "faster_whisper\transcribe.py", line 1456, in restore_speech_timestamps
  File "faster_whisper\transcribe.py", line 805, in generate_segments
  File "faster_whisper\transcribe.py", line 1228, in generate_with_fallback
ValueError: max() arg is an empty sequence
[12336] Failed to execute script '__main__' due to unhandled exception!

...which is kinda weird, as I don't have whisper in this path (it's C:\sharedlib\whisper\faster-whisper-xxl.exe), nor I ever had. Well, nevermind that. is it -1 to 1? Are some models (version, sizes) known for their awful job? Any other parameters that would help?

kalradivyanshu commented 4 days ago

Why not just turn off VAD, and pass in noise?

dgoryeo commented 3 days ago

@Krzysiu , About the temperature, the range is [0.0 ,,, 1.0]. I think that's why you see the error; About the path, faster-whisper-xxl.exe installs all those packages that you see in the path; About producing more hallucination, some of the parameters that affect it are (in addition to VAD which was mentioned earlier): -- temperatore --condition_on_previous_text
--logprob_threshold --no_speech_threshold

Side note, so what does the project do once a lot of hallucination is produced, I'm curious.

SYSTRAN / faster-whisper

Request for help: how to make Whisper results worse #1079

SYSTRAN / faster-whisper

Request for help: how to make Whisper results *worse* #1079

Request for help: how to make Whisper results worse #1079