collabora / WhisperFusion

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
1.45k stars 101 forks source link

Uncaught TypeError in audio-processor.js #15

Open dskill opened 5 months ago

dskill commented 5 months ago

Getting this. I'm trying to view the page on windows while running it in WSL. Everything else seems OK.

Uncaught TypeError: Cannot read properties of undefined (reading 'set') at AudioStreamProcessor.process (audio-processor.js:24:23) process @ audio-processor.js:24

zoq commented 5 months ago

What browser are you using? Any chance you can try with Chrome?

dskill commented 5 months ago

Thx for the quick response. Yes, I'm using chrome currently.

jprovencher commented 5 months ago

If that can help, I have the same issue, Win11 PRO; OS: Win32, Browser: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36

zoq commented 5 months ago

For a quick fix you can apply - https://github.com/collabora/WhisperFusion/issues/17#issuecomment-1918170375

dskill commented 5 months ago

also since it was mentioned in the other thread, here is my server output in case helpful:


==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

done loading
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
INFO:root:[LLM] loaded: True██████████████--------------------| 67.76% [103/152 00:00<00:00]
INFO:websockets.server:connection open████████████████████████| 100.00% [152/152 00:01<00:00]
INFO:websockets.server:connection open
downloading ONNX model...
loading session
loading onnx model
reset states
INFO:root:New client connected
zoq commented 5 months ago

The output looks good and the progress bar you see is the warmup stage of WhisperSpeech. Let me know if the workaround that I posted works for you. We are currently setting up a Windows machine to replicate the issue.

dskill commented 5 months ago

Here's a video with that workaround. It seems to be working, except I don't hear any text to speech (probably not surprising considering the bug i worked around). but still.. progress!

https://github.com/collabora/WhisperFusion/assets/703106/f8ac38dd-19c8-4f47-9b10-5df7587de39d

zoq commented 5 months ago

I can see in the console that the connection to 8888 failed, which is the websocket port that we use to send the audio. I guess that port is forwarded (docker container)? Also, it takes some time until the service started, it has a warmup phase, so depending on the hardware it can take like 30 seconds or more until the system is actually running.

dskill commented 5 months ago

Thanks! I went ahead and tried again, restarting everything. This time it did work. The audio was a bit more delayed than in your video, but it did all work. So presumably in my last test I just needed to either wait for a warm up or the 8888 port forwarding somehow didn't work.

zoq commented 5 months ago

What GPU do you use?

dskill commented 5 months ago

I’m on a 4090 and so was expecting it to be pretty snappy. As far as I can tell the gpu is being utilized but it’s definitely not as responsive as in your video.

On Wed, Jan 31, 2024 at 8:49 PM Marcus Edel @.***> wrote:

What GPU do you use?

— Reply to this email directly, view it on GitHub https://github.com/collabora/WhisperFusion/issues/15#issuecomment-1920507068, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLVAUES27KNB5IWXQAO23YRMND7AVCNFSM6AAAAABCQRZ3MCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGUYDOMBWHA . You are receiving this because you authored the thread.Message ID: @.***>

zoq commented 5 months ago

We are going to add latency outputs for each step to the demo tomorrow, so we compare it easier. The video was recorded on a 4090, so you should see very similar results.

dskill commented 5 months ago

Nice - happy to help test those latency #'s when they're in.

On Wed, Jan 31, 2024 at 9:32 PM Marcus Edel @.***> wrote:

We are going to add latency outputs for each step to the demo tomorrow, so we compare it easier. The video was recorded on a 4090, so you should see very similar results.

— Reply to this email directly, view it on GitHub https://github.com/collabora/WhisperFusion/issues/15#issuecomment-1920547404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLVATALUTG3E3HUJZHMPDYRMSFHAVCNFSM6AAAAABCQRZ3MCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGU2DONBQGQ . You are receiving this because you authored the thread.Message ID: @.***>

sadimoodi commented 2 months ago

We are going to add latency outputs for each step to the demo tomorrow, so we compare it easier. The video was recorded on a 4090, so you should see very similar results.

i am using the same GPU on windows, did u use this command to run the docker:? docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest did u get everything running eventually?