ChetanXpro / nodejs-whisper

NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
https://npmjs.com/nodejs-whisper
MIT License
93 stars 22 forks source link

Says WAV file is valid, then later says it's invalid? #113

Open binarykitchen opened 2 months ago

binarykitchen commented 2 months ago

Running your latest version on ArchLinux.

nodejs-whisper says the WAV file is valid, but later the native whisper instance says it's not. Huh?

[dev:server] [Nodejs-whisper] File is a valid WAV file.

And later it says:

[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'

Here are the details from the logs:

[dev:server] DEBUG: »»-----------------------------------------►
[dev:server] [Nodejs-whisper] Checking and downloading model if needed: base
[dev:server] autoDownloadModelName base
[dev:server] options {
[dev:server]   modelName: 'base',
[dev:server]   autoDownloadModelName: 'base',
[dev:server]   verbose: true,
[dev:server]   removeWavFileAfterTranscription: false,
[dev:server]   whisperOptions: { outputInVtt: true }
[dev:server] }
[dev:server] [Nodejs-whisper] Models already exist. Skipping download.
[dev:server] [Nodejs-whisper] Checking file existence: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Converting file to WAV format: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Checking if the file is a valid WAV: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] File is a valid WAV file.
[dev:server] [Nodejs-whisper] Constructing command for file: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Executing command: ./main  -ovtt -l auto -m ./models/ggml-base.bin  -f /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] code--- 0
[dev:server] stdout--- 
[dev:server] stderr--- whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.bin'
[dev:server] whisper_model_load: loading model
[dev:server] whisper_model_load: n_vocab       = 51865
[dev:server] whisper_model_load: n_audio_ctx   = 1500
[dev:server] whisper_model_load: n_audio_state = 512
[dev:server] whisper_model_load: n_audio_head  = 8
[dev:server] whisper_model_load: n_audio_layer = 6
[dev:server] whisper_model_load: n_text_ctx    = 448
[dev:server] whisper_model_load: n_text_state  = 512
[dev:server] whisper_model_load: n_text_head   = 8
[dev:server] whisper_model_load: n_text_layer  = 6
[dev:server] whisper_model_load: n_mels        = 80
[dev:server] whisper_model_load: ftype         = 1
[dev:server] whisper_model_load: qntvr         = 0
[dev:server] whisper_model_load: type          = 2 (base)
[dev:server] whisper_model_load: adding 1608 extra tokens
[dev:server] whisper_model_load: n_langs       = 99
[dev:server] whisper_model_load:      CPU total size =   147.37 MB
[dev:server] whisper_model_load: model size    =  147.37 MB
[dev:server] whisper_init_state: kv self size  =   16.52 MB
[dev:server] whisper_init_state: kv cross size =   18.43 MB
[dev:server] whisper_init_state: compute buffer (conv)   =   16.39 MB
[dev:server] whisper_init_state: compute buffer (encode) =  132.07 MB
[dev:server] whisper_init_state: compute buffer (cross)  =    4.78 MB
[dev:server] whisper_init_state: compute buffer (decode) =   96.48 MB
[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'
[dev:server] 
[dev:server] whisper_print_timings:     load time =   306.03 ms
[dev:server] whisper_print_timings:     fallbacks =   0 p /   0 h
[dev:server] whisper_print_timings:      mel time =     0.00 ms
[dev:server] whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   encode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   batchd time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:    total time =   312.29 ms
[dev:server] 
[dev:server] stdout--- 
[dev:server] [Nodejs-whisper] Transcribing Done!
[dev:server] [Nodejs-whisper] Error during processing: Transcription failed or produced no output.

Any ideas what this could be?

Thanks!

binarykitchen commented 1 month ago

I think it's because the input sample rate is at 48kHz, while whisper expects it to be at 16 kHz. That said, you should also check the sample rate.

ChetanXpro commented 1 month ago

Yeah i think its due to sample rate, i will look into this issue