Open stopthinking102 opened 1 month ago
You can run whisper.cpp main and set -ac
to a number between 1 and 1500. It's not in milliseconds, rather it's 50 per second. So if you have a 30 second clip (maximum), it's 1500. If you have a 5 second clip, it's 5 * 50 = 250
. It also usually helps to add a constant of around 64, so you'd do 5 * 50 + 64 = 314
.
Programmatically you can set audio_context
field of whisper_full_params
can u share which parameter to set to use this parameter in whisper.cpp. is it the n_audio_ctx paramter with 1500 referring to 1.5 seconds. Would have been great if there was an android sample.
// medium // hparams: { // 'n_mels': 80, // 'n_vocab': 51864, // 'n_audio_ctx': 1500, // 'n_audio_state': 1024, // 'n_audio_head': 16, // 'n_audio_layer': 24, // 'n_text_ctx': 448, // 'n_text_state': 1024, // 'n_text_head': 16, // 'n_text_layer': 24 // } // // default hparams (Whisper tiny) struct whisper_hparams { int32_t n_vocab = 51864; int32_t n_audio_ctx = 1500; int32_t n_audio_state = 384;