Closed jozefchutka closed 11 months ago
I am testing https://whisper-turbo.com/ with the most recent chrome on mac M2 and transcirptions fails with base and tiny models on tos.mp3.zip
Uncaught (in promise) {__wbg_ptr: 58389056}
Hi Jozef, Seems like a strange unsupported encoding of the MP3.
Encoding it using ffmpeg to WAV with the following:
ffmpeg -i tos.mp3 -acodec pcm_s16le -ar 16000 -ac 1 tos.wav
Works fine on the site.
However this does highlight the poor error handling and subpar audio transcoding. Will review it before 0.11 goes out.
Thanks for raising, Chris
Thanks for the quick response, I have created a new wav using ffmpeg ... -acodec pcm_s16le -ar 16000 -ac 1 tos.wav
Now using base model, it hangs mid-way with a different error:
462.98ecf0d3854ca5c4.js:2 panicked at crates/whisper-core/src/logit_mutators/timestamp_rules.rs:87:62:
Thanks for the quick response, I have created a new wav using
ffmpeg ... -acodec pcm_s16le -ar 16000 -ac 1 tos.wav
Now using base model, it hangs mid-way with a different error:
462.98ecf0d3854ca5c4.js:2 panicked at crates/whisper-core/src/logit_mutators/timestamp_rules.rs:87:62:
99% sure this is the same error raised here.
This has already been fixed and will be released in 0.11.0!
Apologies for the inconvenience, Chris
Thank you
@jozefchutka Can confirm this is fixed in 0.11.0. Releasing in the next few days.
@jozefchutka Please test on the latest version.
.wav seems fixed, thanks... The strange unsupported encoding of the MP3 still broken but its not a problem anymore as I can work with wav. keep closed
.wav seems fixed, thanks... The strange unsupported encoding of the MP3 still broken but its not a problem anymore as I can work with wav. keep closed
Awesome! I want to redesign some of the audio transcoding so may be fixed in the future!
Could you please better clarify in doc what is the optimal input for raw_audio
. I see doc mentioning Uint8Array of PCM (.wav). Here, in thread, I can see pcm_s16le / 16000 / mono. If this is the sufficient setup (no transcription precision improvements with higher sampling or channels) for PCM please mention in doc. Also I wonder if .pcm could be used instead of .wav
In comparison transformers.js consumes pcm_f32le/16000/mono .pcm file
Could you please better clarify in doc what is the optimal input for
raw_audio
. I see doc mentioning Uint8Array of PCM (.wav). Here, in thread, I can see pcm_s16le / 16000 / mono. If this is the sufficient setup (no transcription precision improvements with higher sampling or channels) for PCM please mention in doc. Also I wonder if .pcm could be used instead of .wavIn comparison transformers.js consumes pcm_f32le/16000/mono .pcm file
pcm_s16le / 16000 / mono
is 100% the "one true format". This comes directly from OAI.
When you use raw_audio I simply bypass any transcoding, whatever bytes you pass in will be used in the model.
Will clarify better in the docs!
pcm_s16le / 16000 / mono
+ raw_audio bypass any transcoding
very important information. thank you
I am testing https://whisper-turbo.com/ with the most recent chrome on mac M2 and transcirptions fails with base and tiny models on tos.mp3.zip