Docker image re-build failure

zadamg commented 5 months ago

Thanks for this wonderful tool. I updated CUDA to >12 and am on windows 10 with an RTX 3060, which means (I think), that I need to rebuild for sm_86 arch. What do I need to do here?

(whisper) C:\Users\ryzen\Documents\WoNdErLaNd\Personal\WhisperFusion\WhisperFusion\docker> docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest

==========
== CUDA ==
==========

CUDA Version 12.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

done loading
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[01/30/2024-08:29:55] [TRT] [E] 6: The engine plan file is generated on an incompatible device, expecting compute 8.6 got compute 8.9, please rebuild.   👈
[01/30/2024-08:29:55] [TRT] [E] 2: [engine.cpp::deserializeEngine::1148] Error Code 2: Internal Error (Assertion engine->deserialize(start, size, allocator, runtime) failed. )
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/WhisperFusion/llm_service.py", line 195, in run
    self.initialize_model(
  File "/root/WhisperFusion/llm_service.py", line 109, in initialize_model
    self.runner = self.runner_cls.from_dir(**self.runner_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py", line 417, in from_dir
    session = session_cls(model_config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 475, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 153, in __init__
    self.__prepare(mapping, engine_buffer)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 174, in __prepare
    assert self.engine is not None
AssertionError
Exception ignored in: <function _Runtime.__del__ at 0x7ff3885975b0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 279, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Here's the process and resulting logs...

ryzen@DESKTOP-O0HU0GU MINGW64 ~/Documents/WoNdErLaNd/Personal/WhisperFusion/WhisperFusion/docker (main)
$ ./build.sh 86-real  👈
[+] Building 21.5s (8/10)                                                                                                                                                                                                                           docker:default 
 => [internal] load .dockerignore                                                                                                                                                                                                                             0.0s 
 => => transferring context: 2B                                                                                                                                                                                                                               0.0s 
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                          0.0s 
 => => transferring dockerfile: 442B                                                                                                                                                                                                                          0.0s 
 => [internal] load metadata for nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04                                                                                                                                                                                 0.7s 
 => [1/6] FROM nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04@sha256:ae8a022c02aec945c4f8c52f65deaf535de7abb58e840350d19391ec683f4980                                                                                                                           0.0s 
 => [internal] load build context                                                                                                                                                                                                                             0.0s 
 => => transferring context: 75B                                                                                                                                                                                                                              0.0s 
 => CACHED [2/6] WORKDIR /root                                                                                                                                                                                                                                0.0s 
 => CACHED [3/6] COPY install-deps.sh /root                                                                                                                                                                                                                   0.0s 
 => ERROR [4/6] RUN bash install-deps.sh && rm install-deps.sh                                                                                                                                                                                               20.7s 
------
 > [4/6] RUN bash install-deps.sh && rm install-deps.sh:
0.248 install-deps.sh: line 2: $'\r': command not found
0.303 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1581 B]
0.341 Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [665 kB]
0.364 Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
0.517 Get:4 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
0.657 Get:5 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1398 kB]
1.090 Get:6 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.6 kB]
1.258 Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [1685 kB]
1.441 Get:8 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
1.716 Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1060 kB]
2.080 Get:10 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
2.706 Get:11 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
3.439 Get:12 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
5.299 Get:13 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
6.554 Get:14 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
7.449 Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [1722 kB]
11.08 Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [50.4 kB]
11.57 Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1677 kB]
12.81 Get:18 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1326 kB]
15.71 Get:19 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [50.4 kB]
16.20 Get:20 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [28.1 kB]
16.36 Fetched 30.0 MB in 16s (1865 kB/s)
16.36 Reading package lists...
17.06 Reading package lists...
17.76 Building dependency tree...
17.93 Reading state information...
17.93 E: Unable to locate package git-lfs
17.93 install-deps.sh: line 4: git: command not found
17.93 install-deps.sh: line 5: cd: $'TensorRT-LLM\r': No such file or directory
17.94 install-deps.sh: line 6: git: command not found
17.94 install-deps.sh: line 7: git: command not found
17.94 install-deps.sh: line 8: git: command not found
17.94 install-deps.sh: line 9: git: command not found
17.94 install-deps.sh: line 10: $'\r': command not found
17.94 
17.94 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
17.94 
18.70 
18.70 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
18.70 
19.45
19.45 WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
19.45
20.23 (Stripping trailing CRs from patch; use --binary to disable.)
20.23 can't find file to patch at input line 5
20.23 Perhaps you used the wrong -p or --strip option?
20.23 The text leading up to this was:
20.23 --------------------------
20.23 |diff --git a/docker/common/install_tensorrt.sh b/docker/common/install_tensorrt.sh
20.23 |index 2dcb0a6..3a27e03 100644
20.23 |--- a/docker/common/install_tensorrt.sh
20.23 |+++ b/docker/common/install_tensorrt.sh
20.23 --------------------------
20.23 File to patch:
20.23 Skip this patch? [y]
20.23 Skipping patch.
20.23 patch: **** malformed patch at line 13: libnccl2/unknown,now 2.19.3-1+cuda12.2 amd64 [installed,upgradable to: 2.19.3-1+cuda12.3] ]]; then
20.23
20.23 install-deps.sh: line 38: $'\r': command not found
20.23 install-deps.sh: line 39: cd: $'docker/common/\r': No such file or directory
: No such file or directory
: No such file or directoryh
: No such file or directory 44: /etc/shinit_v2
: No such file or directorysh
20.23 install-deps.sh: line 47: pip3: command not found
: No such file or directoryt.sh
: No such file or directoryphy.sh
: No such file or directory 50: /etc/shinit_v2
20.24 install-deps.sh: line 51: $'\r': command not found
20.24 install-deps.sh: line 52: cd: $'/root/TensorRT-LLM/docker/common/\r': No such file or directory
: No such file or directorysh
: No such file or directory 54: /etc/shinit_v2
------
Dockerfile:11
--------------------
   9 |     COPY install-deps.sh /root
  10 |     ENV CUDA_ARCH=${CUDA_ARCH}
  11 | >>> RUN bash install-deps.sh && rm install-deps.sh
  12 |
  13 |     COPY install-trt-llm.sh /root
--------------------
ERROR: failed to solve: process "/bin/sh -c bash install-deps.sh && rm install-deps.sh" did not complete successfully: exit code: 1

ryzen@DESKTOP-O0HU0GU MINGW64 ~/Documents/WoNdErLaNd/Personal/WhisperFusion/WhisperFusion/docker (main)
$

makaveli10 commented 5 months ago

@zadamg We havent tried to build this on Windows yet, we will do that and get back to you.

jpc commented 5 months ago

Hey, @zadamg would you mind to try again with this repository? https://github.com/jpc/WhisperFusion

It looks like you have Windows/Unix line ending conflicts. In my fork I added a config that hopefully will prevent Git from changing the line endings when you check out the repository and it should make the scripts work well in Docker.

Another problem seems to be the CUDA version. I'll look into that next.

makaveli10 commented 5 months ago

@zadamg we pushed an image for 3090 as well which should work on windows

docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest

zadamg commented 5 months ago

@zadamg we pushed an image for 3090 as well which should work on windows
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion-3090:latest

Thank you guys.

I was able to download and build...

Unfortunately, I'm getting errors with the AudioWorklet constructor and am not sure how to trouble-shoot. Same error on Chrome, Brave, and Firefox.

class AudioStreamProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    this.chunkSize = 4096;
    this.buffer = new Float32Array(this.chunkSize);
    this.bufferPointer = 0;
  }

  process(inputs, outputs, parameters) {
    const input = inputs[0];
    const output = outputs[0];

    for (let i = 0; i < input[0].length; i++) {
      this.buffer[this.bufferPointer++] = input[0][i];

      if (this.bufferPointer >= this.chunkSize) {

        this.port.postMessage(this.buffer);
        this.bufferPointer = 0;
      }
    }

    for (let channel = 0; channel < input.length; ++channel) {
      output[channel].set(input[channel]);  ❌
    }

    return true;
  }
}

registerProcessor("audio-stream-processor", AudioStreamProcessor);


const start_recording = async () => {
    console.log("😀")
    console.log(audioContext)
    console.log(audioContext_tts)
    try {
        if (audioContext) {

            await audioContext.resume();

            const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

            console.log(`🌊 stream: ${stream}`)
            if (!audioContext) return;
            console.log(`audioContext state: ${audioContext?.state}`);

            await audioContext.audioWorklet.addModule("js/audio-processor.js");

            const source = audioContext.createMediaStreamSource(stream);

            console.log(`👻 source: ${source}`)

            audioWorkletNode = new AudioWorkletNode(audioContext, "audio-stream-processor");

            audioWorkletNode.port.onmessage = (event) => {
                if (server_state != 1) {
                  console.log("server is not ready!!")
                  return;
                }
                const audioData = event.data;
                if (websocket && websocket.readyState === WebSocket.OPEN && audio_state == 0) {
                    websocket.send(audioData.buffer);
                    console.log("send data")
                }
            };

            source.connect(audioWorkletNode);
        }
    } catch (e) {
        console.log("Error", e);
    }
};

zadamg commented 5 months ago

I added a simple check of the buffer, which makes the application runnable, but maybe I'm breaking something:

  if (output[channel]) { // check for output
    output[channel].set(input[channel]);
  }

It works-ish, but the responses sometimes don't come in. Here's a video of the experience:

https://www.loom.com/share/9090a6055384422d9e804104e455fcac?sid=4c35818d-ff9c-48c2-b958-4661851ae40a

dskill commented 5 months ago

This looks to be the same issue I'm having FWIW. #15

makaveli10 commented 5 months ago

@zadamg Great that you got the initial issue sorted out.

So, we are running the TTS model with torch.compile optmisation to make the inference faster. In order to do that we have to warmup the TTS model, I would recommend checking the logs of the server and wait for the all the models to fully load i.e. let the TTS model warmup. And sharing the server logs would certainly help us understand the issue better.

zadamg commented 5 months ago

@zadamg Great that you got the initial issue sorted out.

So, we are running the TTS model with torch.compile optmisation to make the inference faster. In order to do that we have to warmup the TTS model, I would recommend checking the logs of the server and wait for the all the models to fully load i.e. let the TTS model warmup. And sharing the server logs would certainly help us understand the issue better.

Will do. It's worth noting that I DID get a response normally the very first and only time I opened-up and started the app, but didn't wok thereafter. Maybe that supports the warm-up hypothesis

collabora / WhisperFusion

Docker image re-build failure #17