[BUG] Audio streaming only available if source is 'microphone' on Ubuntu 22.04

j2l commented 1 month ago

Describe the bug Can't run web UI

To Reproduce python tools/webui...

Expected behavior Run

Screenshots / log

python tools/webui.py \
    --llama-checkpoint-path checkpoints/fish-speech-1.2-sft \
    --decoder-checkpoint-path checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth

/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2024-07-19 18:59:57.695 | INFO     | __main__:<module>:523 - Loading Llama model...
2024-07-19 19:00:03.030 | INFO     | tools.llama.generate:load_model:347 - Restored model from checkpoint
2024-07-19 19:00:03.030 | INFO     | tools.llama.generate:load_model:351 - Using DualARTransformer
2024-07-19 19:00:03.031 | INFO     | __main__:<module>:530 - Llama model loaded, loading VQ-GAN model...
2024-07-19 19:00:04.087 | INFO     | tools.vqgan.inference:load_model:44 - Loaded model: <All keys matched successfully>
2024-07-19 19:00:04.087 | INFO     | __main__:<module>:538 - Decoder model loaded, warming up...
2024-07-19 19:00:04.088 | INFO     | tools.api:encode_reference:117 - No reference audio provided
2024-07-19 19:00:04.120 | INFO     | tools.llama.generate:generate_long:432 - Encoded text: Hello, world!
2024-07-19 19:00:04.120 | INFO     | tools.llama.generate:generate_long:450 - Generating sentence 1/1 of sample 1/1
  0%|                                                                         | 0/4080 [00:00<?, ?it/s]/home/pm/.local/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
  1%|▌                                                               | 39/4080 [00:01<02:46, 24.21it/s]
2024-07-19 19:00:06.246 | INFO     | tools.llama.generate:generate_long:505 - Generated 41 tokens in 2.13 seconds, 19.28 tokens/sec
2024-07-19 19:00:06.247 | INFO     | tools.llama.generate:generate_long:508 - Bandwidth achieved: 9.45 GB/s
2024-07-19 19:00:06.247 | INFO     | tools.llama.generate:generate_long:513 - GPU Memory used: 1.42 GB
2024-07-19 19:00:06.266 | INFO     | tools.api:decode_vq_tokens:128 - VQ features: torch.Size([4, 40])
/home/pm/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv1d(input, weight, bias, self.stride,
2024-07-19 19:00:07.330 | INFO     | __main__:<module>:555 - Warming up done, launching the web UI...
/mnt/phil/fish-speech-main/tools/webui.py:343: UserWarning: You have unused kwarg parameters in Checkbox, please remove them: {'scale': 0, 'min_width': 150}
  if_refine_text = gr.Checkbox(
Traceback (most recent call last):
  File "/mnt/phil/fish-speech-main/tools/webui.py", line 557, in <module>
    app = build_app()
  File "/mnt/phil/fish-speech-main/tools/webui.py", line 439, in build_app
    stream_audio = gr.Audio(
  File "/home/pm/.local/lib/python3.10/site-packages/gradio/components.py", line 2387, in __init__
    raise ValueError(
ValueError: Audio streaming only available if source is 'microphone'.

Additional context RTX 3060 (12GB) Driver Version: 550.67 CUDA Version: 12.4

AnyaCoder commented 1 month ago

Maybe you need to install a brand new python=3.10 environment, then pip install -e .

Brian-AI-strategist commented 1 month ago

Maybe you need to install a brand new python=3.10 environment, then pip install -e .

j2l commented 1 month ago

Maybe I need to install a brand new python=3.10 environment, then pip install -e . :smile:

fishaudio / fish-speech

[BUG] Audio streaming only available if source is 'microphone' on Ubuntu 22.04 #403