ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.98k stars 3.57k forks source link

can the whisper stream support input audio files? like pcm, wav ... format . #800

Open yuanconghao opened 1 year ago

gcr commented 1 year ago

You can use a tool like ffmpeg or avconv to convert any audio or video format to Whisper! Try something like this:

$ ffmpeg -i video.mp4 -f wav -ar 16000 - | ./main -m path/to/model.ggml.bin -

Note the trailing - on both commands, which instructs ffmpeg to write the wav file to stdout and instructs whisper to read from - on stdin.

yuanconghao commented 1 year ago

got it, thanks.

Kimmy @.***> 于2023年4月21日周五 23:02写道:

You can use a tool like ffmpeg or avconv to convert any audio or video format to Whisper! Try something like this:

$ ffmpeg -i video.mp4 -f wav -ar 16000 - | ./main -m path/to/model.ggml.bin -

Note the trailing - on both commands, which instructs ffmpeg to write the wav file to stdout and instructs whisper to read from - on stdin.

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/whisper.cpp/issues/800#issuecomment-1517967677, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADG3KGTKR3LLBJ6SMGPQI5TXCKOQPANCNFSM6AAAAAAXD6K3C4 . You are receiving this because you authored the thread.Message ID: @.***>

franalbani commented 1 year ago

Sox is also very handy: sox input.wav -r 16000 -b 16 output.wav

flatsiedatsie commented 8 months ago

Is there a way to feed audio files into a continuously waiting instance of stream? I've been using main on demand, but it's very slow compared to how stream works. Probablt because this means it has to load the model every time a new audio snipped is ready.

Stream is much better for fast 'on demand' work, except that it's only input option is the microphone, which in my case, is already occupied.

// It seems the server tool is useful for this use case: https://github.com/ggerganov/whisper.cpp/tree/master/examples/server

WilliamTambellini commented 6 months ago

@slaren would you consider a PR optionaly linking the server/stream executables with libffmpeg/libsox in order to convert the input on the fly and in memory (vs triggering/running an external process) ? Best

slaren commented 6 months ago

That would be up to @ggerganov , but I see no issue with it as long as it is optional.

ggerganov commented 6 months ago

Yup, it could be a good addition

WilliamTambellini commented 5 months ago

Good. @ggerganov Is libffmpeg the best option or would you prefer another lib (sox, ...) ? Refs: https://github.com/FFmpeg/FFmpeg https://johnvansickle.com/ffmpeg/

r0d0dendr0n commented 5 months ago

Good. @ggerganov Is libffmpeg the best option or would you prefer another lib (sox, ...) ? Refs: https://github.com/FFmpeg/FFmpeg https://johnvansickle.com/ffmpeg/

Why not make compile time options?