Open zadamg opened 5 months ago
@zadamg We havent tried to build this on Windows yet, we will do that and get back to you.
Hey, @zadamg would you mind to try again with this repository? https://github.com/jpc/WhisperFusion
It looks like you have Windows/Unix line ending conflicts. In my fork I added a config that hopefully will prevent Git from changing the line endings when you check out the repository and it should make the scripts work well in Docker.
Another problem seems to be the CUDA version. I'll look into that next.
@zadamg we pushed an image for 3090 as well which should work on windows
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest
@zadamg we pushed an image for 3090 as well which should work on windows
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest
Thank you guys.
I was able to download and build...
Unfortunately, I'm getting errors with the AudioWorklet constructor and am not sure how to trouble-shoot. Same error on Chrome, Brave, and Firefox.
class AudioStreamProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.chunkSize = 4096;
this.buffer = new Float32Array(this.chunkSize);
this.bufferPointer = 0;
}
process(inputs, outputs, parameters) {
const input = inputs[0];
const output = outputs[0];
for (let i = 0; i < input[0].length; i++) {
this.buffer[this.bufferPointer++] = input[0][i];
if (this.bufferPointer >= this.chunkSize) {
this.port.postMessage(this.buffer);
this.bufferPointer = 0;
}
}
for (let channel = 0; channel < input.length; ++channel) {
output[channel].set(input[channel]); ❌
}
return true;
}
}
registerProcessor("audio-stream-processor", AudioStreamProcessor);
const start_recording = async () => {
console.log("😀")
console.log(audioContext)
console.log(audioContext_tts)
try {
if (audioContext) {
await audioContext.resume();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
console.log(`🌊 stream: ${stream}`)
if (!audioContext) return;
console.log(`audioContext state: ${audioContext?.state}`);
await audioContext.audioWorklet.addModule("js/audio-processor.js");
const source = audioContext.createMediaStreamSource(stream);
console.log(`👻 source: ${source}`)
audioWorkletNode = new AudioWorkletNode(audioContext, "audio-stream-processor");
audioWorkletNode.port.onmessage = (event) => {
if (server_state != 1) {
console.log("server is not ready!!")
return;
}
const audioData = event.data;
if (websocket && websocket.readyState === WebSocket.OPEN && audio_state == 0) {
websocket.send(audioData.buffer);
console.log("send data")
}
};
source.connect(audioWorkletNode);
}
} catch (e) {
console.log("Error", e);
}
};
I added a simple check of the buffer, which makes the application runnable, but maybe I'm breaking something:
if (output[channel]) { // check for output
output[channel].set(input[channel]);
}
It works-ish, but the responses sometimes don't come in. Here's a video of the experience:
https://www.loom.com/share/9090a6055384422d9e804104e455fcac?sid=4c35818d-ff9c-48c2-b958-4661851ae40a
This looks to be the same issue I'm having FWIW. #15
@zadamg Great that you got the initial issue sorted out.
So, we are running the TTS model with torch.compile
optmisation to make the inference faster. In order to do that we have to warmup the TTS model, I would recommend checking the logs of the server and wait for the all the models to fully load i.e. let the TTS model warmup. And sharing the server logs would certainly help us understand the issue better.
@zadamg Great that you got the initial issue sorted out.
So, we are running the TTS model with
torch.compile
optmisation to make the inference faster. In order to do that we have to warmup the TTS model, I would recommend checking the logs of the server and wait for the all the models to fully load i.e. let the TTS model warmup. And sharing the server logs would certainly help us understand the issue better.
Will do. It's worth noting that I DID get a response normally the very first and only time I opened-up and started the app, but didn't wok thereafter. Maybe that supports the warm-up hypothesis
Thanks for this wonderful tool. I updated CUDA to >12 and am on windows 10 with an RTX 3060, which means (I think), that I need to rebuild for sm_86 arch. What do I need to do here?
Here's the process and resulting logs...