FL33TW00D / whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️
https://whisper-turbo.com
Apache License 2.0
1.72k stars 75 forks source link

[BUG] whisper-turbo.com throws errors #59

Closed jozefchutka closed 11 months ago

jozefchutka commented 11 months ago

I am testing https://whisper-turbo.com/ with the most recent chrome on mac M2 and transcirptions fails with base and tiny models on tos.mp3.zip

Uncaught (in promise) {__wbg_ptr: 58389056}

Screenshot 2023-11-24 at 8 41 56

FL33TW00D commented 11 months ago

I am testing https://whisper-turbo.com/ with the most recent chrome on mac M2 and transcirptions fails with base and tiny models on tos.mp3.zip

Uncaught (in promise) {__wbg_ptr: 58389056}

Screenshot 2023-11-24 at 8 41 56

Hi Jozef, Seems like a strange unsupported encoding of the MP3.

Encoding it using ffmpeg to WAV with the following: ffmpeg -i tos.mp3 -acodec pcm_s16le -ar 16000 -ac 1 tos.wav

Works fine on the site.

However this does highlight the poor error handling and subpar audio transcoding. Will review it before 0.11 goes out.

Thanks for raising, Chris

jozefchutka commented 11 months ago

Thanks for the quick response, I have created a new wav using ffmpeg ... -acodec pcm_s16le -ar 16000 -ac 1 tos.wav

tos.wav.zip

Now using base model, it hangs mid-way with a different error:

462.98ecf0d3854ca5c4.js:2 panicked at crates/whisper-core/src/logit_mutators/timestamp_rules.rs:87:62:

Screenshot 2023-11-24 at 9 39 17

FL33TW00D commented 11 months ago

Thanks for the quick response, I have created a new wav using ffmpeg ... -acodec pcm_s16le -ar 16000 -ac 1 tos.wav

tos.wav.zip

Now using base model, it hangs mid-way with a different error:

462.98ecf0d3854ca5c4.js:2 panicked at crates/whisper-core/src/logit_mutators/timestamp_rules.rs:87:62:

Screenshot 2023-11-24 at 9 39 17

99% sure this is the same error raised here.

This has already been fixed and will be released in 0.11.0!

Apologies for the inconvenience, Chris

jozefchutka commented 11 months ago

Thank you

FL33TW00D commented 11 months ago

@jozefchutka Can confirm this is fixed in 0.11.0. Releasing in the next few days.

FL33TW00D commented 11 months ago

@jozefchutka Please test on the latest version.

jozefchutka commented 11 months ago

.wav seems fixed, thanks... The strange unsupported encoding of the MP3 still broken but its not a problem anymore as I can work with wav. keep closed

FL33TW00D commented 11 months ago

.wav seems fixed, thanks... The strange unsupported encoding of the MP3 still broken but its not a problem anymore as I can work with wav. keep closed

Awesome! I want to redesign some of the audio transcoding so may be fixed in the future!

jozefchutka commented 11 months ago

Could you please better clarify in doc what is the optimal input for raw_audio. I see doc mentioning Uint8Array of PCM (.wav). Here, in thread, I can see pcm_s16le / 16000 / mono. If this is the sufficient setup (no transcription precision improvements with higher sampling or channels) for PCM please mention in doc. Also I wonder if .pcm could be used instead of .wav

In comparison transformers.js consumes pcm_f32le/16000/mono .pcm file

FL33TW00D commented 11 months ago

Could you please better clarify in doc what is the optimal input for raw_audio. I see doc mentioning Uint8Array of PCM (.wav). Here, in thread, I can see pcm_s16le / 16000 / mono. If this is the sufficient setup (no transcription precision improvements with higher sampling or channels) for PCM please mention in doc. Also I wonder if .pcm could be used instead of .wav

In comparison transformers.js consumes pcm_f32le/16000/mono .pcm file

pcm_s16le / 16000 / mono is 100% the "one true format". This comes directly from OAI.

When you use raw_audio I simply bypass any transcoding, whatever bytes you pass in will be used in the model.

Will clarify better in the docs!

jozefchutka commented 11 months ago

pcm_s16le / 16000 / mono + raw_audio bypass any transcoding very important information. thank you