aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
325 stars 63 forks source link

Bug: ERROR: Failed to initialized SDL: dsp: No such audio device #114

Open acheong08 opened 1 year ago

acheong08 commented 1 year ago

Describe the bug

Streaming issue. Can't find/list audio devices

To reproduce

Standard installation instructions

"""Some streaming examples."""

import os
import sys
import typing as t

import whispercpp_py as w

def main(**kwargs: t.Any):
    kwargs.pop("list_audio_devices")
    mname = kwargs.pop("model_name", os.getenv("GGML_MODEL", "tiny.en"))
    iterator: t.Iterator[str] | None = None
    try:
        iterator = w.Whisper.from_pretrained(mname).stream_transcribe(**kwargs)
    finally:
        assert iterator is not None, "Something went wrong!"
        sys.stderr.writelines(
            ["\nTranscription (line by line):\n"] + [f"{it}\n" for it in iterator]
        )
        sys.stderr.flush()

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("--model_name", required=False)
    parser.add_argument(
        "--device_id", type=int, help="Choose the audio device", default=0
    )
    parser.add_argument(
        "--length_ms",
        type=int,
        help="Length of the audio buffer in milliseconds",
        default=5000,
    )
    parser.add_argument(
        "--sample_rate",
        type=int,
        help="Sample rate of the audio device",
        default=w.api.SAMPLE_RATE,
    )
    parser.add_argument(
        "--n_threads",
        type=int,
        help="Number of threads to use for decoding",
        default=8,
    )
    parser.add_argument(
        "--step_ms",
        type=int,
        help="Step size of the audio buffer in milliseconds",
        default=2000,
    )
    parser.add_argument(
        "--keep_ms",
        type=int,
        help="Length of the audio buffer to keep in milliseconds",
        default=200,
    )
    parser.add_argument(
        "--max_tokens",
        type=int,
        help="Maximum number of tokens to decode",
        default=32,
    )
    parser.add_argument("--audio_ctx", type=int, help="Audio context", default=0)
    parser.add_argument(
        "--list_audio_devices",
        action="store_true",
        default=False,
        help="Show available audio devices",
    )

    args = parser.parse_args()

    if args.list_audio_devices:
        w.utils.available_audio_devices()
        sys.exit(0)

    main(**vars(args))
$ python3 stream.py --list_audio_devices
ERROR: Failed to initialized SDL: dsp: No such audio device

 $ python3 stream.py --model_name ggml-base.en.bin
whisper_init_from_file_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
ERROR: Failed to initialized SDL: dsp: No such audio device
Traceback (most recent call last):
  File "/home/acheong/.models/whisper_ggml/stream.py", line 15, in main
    iterator = w.Whisper.from_pretrained(mname).stream_transcribe(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/acheong/venv/lib/python3.11/site-packages/whispercpp_py/__init__.py", line 256, in stream_transcribe
    raise RuntimeError("Failed to initialize audio capture device.")
RuntimeError: Failed to initialize audio capture device.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/acheong/.models/whisper_ggml/stream.py", line 82, in <module>
    main(**vars(args))
  File "/home/acheong/.models/whisper_ggml/stream.py", line 17, in main
    assert iterator is not None, "Something went wrong!"
           ^^^^^^^^^^^^^^^^^^^^
AssertionError: Something went wrong!

Expected behavior

(venv) [ 12:14AM ]  [ acheong@InsignificantV3:~/.models/whisper_ggml/whisper.cpp(master✔) ]
 $ ./stream -m ~/.models/whisper_ggml/ggml-base.en.bin -t 8 --step 500 --length 5000
init: found 1 capture devices:
init:    - Capture device #0: 'Built-in Audio Analog Stereo'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_init_from_file_no_state: loading model from '/home/acheong/.models/whisper_ggml/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB

main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 8 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1

 This is your... this more you.
 (drum roll)
whisper_print_timings:     load time =    85.85 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =  1614.78 ms
whisper_print_timings:   sample time =   293.89 ms /   431 runs (    0.68 ms per run)
whisper_print_timings:   encode time = 10957.55 ms /     8 runs ( 1369.69 ms per run)
whisper_print_timings:   decode time =  1747.24 ms /   420 runs (    4.16 ms per run)
whisper_print_timings:    total time = 16279.01 ms

Environment

$ python -V
Python 3.11.2

acheong@InsignificantV3 
----------------------- 
OS: Ubuntu 23.04 x86_64 
Host: Laptop AB 
Kernel: 6.2.8-060208-generic 
Uptime: 9 hours, 3 mins 
Packages: 4237 (dpkg), 47 (nix-default), 14 (flatpak), 27 (snap) 
Shell: zsh 5.9 
Resolution: 2256x1504 
DE: GNOME 44.0 
WM: Mutter 
WM Theme: WhiteSur-Dark 
Theme: WhiteSur-Dark [GTK2/3] 
Icons: WhiteSur-dark [GTK2/3] 
Terminal: gnome-terminal 
CPU: 11th Gen Intel i7-1165G7 (8) @ 4.700GHz 
GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics] 
Memory: 7047MiB / 15769MiB
EricKong1985 commented 1 year ago

what enviroment is ok to run ? in my pc it use core dump Eric@Eric-thurley:~/Downloads/whispercpp-0.0.17/examples/stream$ python3 stream.py --list_audio_devices Illegal instruction (core dumped)

adntaha commented 1 year ago

I'm also experiencing this same issue regarding SDL2

acheong08 commented 1 year ago

https://github.com/ggerganov/whisper.cpp works so I assume it's a binding issue

ghuznee commented 10 months ago

im experiencing the same issue

AIWintermuteAI commented 4 months ago

See here https://github.com/AIWintermuteAI/whispercpp/issues/88#issuecomment-2237043595