dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.89k stars 416 forks source link

`homeassistant-core` `wyoming` enabled voice assistant add-ons #481

Closed ms1design closed 2 months ago

ms1design commented 2 months ago

Hi @dusty-nv,

Pushing this in hope that you could build on your farm the pytorch wheels from wyoming-piper container (mentioned here) to your pip repo.

It's also a good starting point to testing all integrations working together over wyoming protocol. I don't expect it will work out of the box – I can test it on my devices on the weekend.

If anyone wanna to try this, here's the docker-compose I'm using for testing:

name: home-assistant
version: "3.9"
services:
  home-assistant:
    image: ms1design/homeassistant-core:latest-r36.2.0-cu124
    restart: unless-stopped
    runtime: nvidia
    privileged: true
    network_mode: host
    container_name: home-assistant
    hostname: home-assistant
    ports:
      - "8123:8123"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - config:/config
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    environment:
      TZ: Europe/Amsterdam
    stdin_open: true
    tty: true
    healthcheck:
      test: curl -s -o /dev/null -w "%{http_code}" http://localhost:8123 || exit 1
      interval: 1m
      timeout: 30s
      retries: 3

  openwakeword:
    image: ms1design/wyoming-openwakeword:latest-r36.2.0-cu124
    restart: unless-stopped
    runtime: nvidia
    network_mode: host
    container_name: openwakeword
    hostname: openwakeword
    ports:
      - "10400:10400/tcp"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - openwakeword_models:/share/openwakeword
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    environment:
      TZ: Europe/Amsterdam
    stdin_open: true
    tty: true
    healthcheck:
      test: ["CMD", "echo", "{ \"type\": \"describe\" }", "|", "nc", "-w", "1", "localhost", "10400", "|", "grep", "-iq", "openWakeWord", "||", "exit", "1"]
      interval: 1m
      timeout: 30s
      retries: 3

  faster-whisper:
    image: ms1design/wyoming-whisper:r36.2.0-cu124
    restart: unless-stopped
    runtime: nvidia
    network_mode: host
    container_name: faster-whisper
    hostname: faster-whisper
    ports:
      - "10300:10300/tcp"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - whisper_models:/share/whisper
      - whisper_data:/data
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    environment:
      TZ: Europe/Amsterdam
    stdin_open: true
    tty: true

  assist-microphone:
    image: ms1design/wyoming-assist-microphone:r36.2.0-cu124
    restart: unless-stopped
    network_mode: host
    container_name: assist-microphone
    hostname: assist-microphone
    depends_on:
      - openwakeword
    ports:
      - "10700:10700/tcp"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - assist_microphone_share:/share
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    environment:
      TZ: Europe/Amsterdam
    stdin_open: true
    tty: true

volumes:
  config:
    name: ha-config
  openwakeword_models:
    name: ha-openwakeword-models
  whisper_models:
    name: ha-whisper-models
  whisper_data:
    name: ha-whisper-data
  assist_microphone_share:
    name: ha-assist-microphone-share
dusty-nv commented 2 months ago

Cool @ms1design ! Okay, I see in your docker-compose you are building everything against CUDA 12.4 which is why it is going after PyTorch 2.3rc, however that isn't really stable yet (I added it mostly for preliminary TRT-LLM work). Right now until I can merge your PR, I will build/upload the wheel for pytorch 2.2 for Python 3.11 and CUDA 12.2. Then when PyTorch 2.3 is actually released (it is currently up to RC12), I will build that.

ms1design commented 2 months ago

Ahhh that makes sense @dusty-nv, dunno why I missed that ;) Let me just rebuild those against cuda:12.2 👍

I need to spend some time to update my CI/CD env to support latest improvements here :)

dusty-nv commented 2 months ago

I think it will still make you build pytorch 2.2 for python 3.11 because I don't have that up yet...kicking it off now

dusty-nv commented 2 months ago

OK, pytorch 2.2 wheel for Python 3.11 and CUDA 12.2 is up: http://jetson.webredirect.org/jp6/cu122/torch/2.2.0

Will look at merging this PR shortly!

johnnynunez commented 2 months ago

OK, pytorch 2.2 wheel for Python 3.11 and CUDA 12.2 is up: http://jetson.webredirect.org/jp6/cu122/torch/2.2.0

Will look at merging this PR shortly!

All packages should be in python 3.11. It is the standalone now from desktop, so people that come to jetson, it would be fine to find python3.11 packages

dusty-nv commented 2 months ago

@johnnynunez I am not changing the default python version away yet for all containers from what is the default on that version of ubuntu (so python 3.10 on ubuntu 22.04), but for the containers that need it they can specify which python they need

johnnynunez commented 2 months ago

I am not changing the default python version away yet for all containers from what is the default on that version of ubuntu (so python 3.10 on ubuntu 22.04), but for the containers that need it they can specify which python they need

Yes yes, your solution is great

ms1design commented 2 months ago

Nice @dusty-nv, will try that tomorrow 🙌

ms1design commented 2 months ago

Fixed wyoming-piper wrapper container for piper-tts, it's still using the CPU unfortunately 🕯️ Probably we just need to patch some thing for now...

From the good news I managed to integrate all required containers into working state with Home Assistant in a way that we can configure full Voice Assistant Pipeline in HA running on Jetson:

Screenshot 2024-04-19 at 18 11 54

We can choose the languages and models:

Screenshot 2024-04-19 at 18 12 35

wyoming-assist-microphone container allows to connect your Mic/Speaker (eg, most loved Anker S330 over USB) and it has also nice native control supported as shown below. In addition it's not using VAD, but relies on wyoming-openwakeword container to detect the wake word.

Screenshot 2024-04-19 at 18 17 25

There's still quite a lot of TODO's in this PR, but it's ready for initial testing if anyone is interested :) Questions welcome.

dusty-nv commented 2 months ago

Oh wow, that's amazing you got all the add-ons build, loading, and running! Huge step, thanks @ms1design !!

Maybe wyoming-piper needs to specify use_cuda=True when it loads PiperVoice:

self.model = PiperVoice.load(model_path, config_path=config_path, use_cuda=True)

In parallel with your efforts, I have been integrating PiperTTS into the NanoLLM agents today. It's sounding good!

ms1design commented 2 months ago

Correct @dusty-nv , when you follow my mention on https://github.com/rhasspy/wyoming-piper/pull/5 you will find the required changes there as a diff (bottom of every PR page on github) – not sure if that would be enough, but I taken some break to play around with other things ;)

ms1design commented 2 months ago

Update:

https://github.com/dusty-nv/jetson-containers/assets/24204300/9bd189ec-d2c9-4e93-9a41-26525cb1cef1

_piper-tts_logs.txt

ms1design commented 2 months ago

Update:

Known Issues

ms1design commented 2 months ago

Update

Known Issues

TODO

docker-compose.yaml

name: home-assistant-jetson
version: "3.9"
services:
  home-assistant:
    image: ms1design/homeassistant-core:latest-r36.2.0-cu122
    restart: unless-stopped
    init: false
    privileged: true
    network_mode: host
    container_name: home-assistant
    hostname: home-assistant
    ports:
      - "8123:8123"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - config:/config
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    stdin_open: true
    tty: true

  openwakeword:
    image: ms1design/wyoming-openwakeword:latest-r36.2.0-cu122
    restart: unless-stopped
    runtime: nvidia
    network_mode: host
    container_name: openwakeword
    hostname: openwakeword
    init: false
    depends_on:
      - faster-whisper
    ports:
      - "10400:10400/tcp"
    volumes:
      - openwakeword_models:/share/openwakeword
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    stdin_open: true
    tty: true

  faster-whisper:
    image: ms1design/wyoming-whisper:latest-r36.2.0-cu122
    restart: unless-stopped
    runtime: nvidia
    network_mode: host
    container_name: faster-whisper
    hostname: faster-whisper
    init: false
    ports:
      - "10300:10300/tcp"
    volumes:
      - whisper_models:/share/whisper
      - whisper_data:/data
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    stdin_open: true
    tty: true

  assist-microphone:
    image: ms1design/wyoming-assist-microphone:latest-r36.2.0-cu122
    restart: unless-stopped
    network_mode: host
    container_name: assist-microphone
    hostname: assist-microphone
    init: false
    depends_on:
      - openwakeword
    ports:
      - "10700:10700/tcp"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - assist_microphone_share:/share
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    environment:
      AUDIO_DEVICE: "plughw:CARD=S330,DEV=0"
    stdin_open: true
    tty: true

  piper-tts:
    image: ms1design/wyoming-piper:master-r36.2.0-cu122
    restart: unless-stopped
    network_mode: host
    runtime: nvidia
    container_name: piper-tts
    hostname: piper-tts
    init: false
    ports:
      - "10200:10200/tcp"
    devices:
      - /dev/snd:/dev/snd
      - /dev/bus/usb
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    stdin_open: true
    tty: true

volumes:
  config:
    name: ha-config
  openwakeword_models:
    name: ha-openwakeword-models
  whisper_models:
    name: ha-whisper-models
  whisper_data:
    name: ha-whisper-data
  assist_microphone_share:
    name: ha-assist-microphone-share
dusty-nv commented 2 months ago

🥳 🙌 🎉 thanks @ms1design!, trying to build these now - will push to dockerhub if successful.

Are there additional procedures needed to document for when others are testing? Or do you basically just need to change "plughw:CARD=S330,DEV=0" to your desired audio device in your docker-compose.yml

ms1design commented 2 months ago

Hi @dusty-nv,

Basically one should take a look on the all ENV variables declared in each of wyoming-container Dockerfiles. Some things like sound or mic volume is only configurable by those variables. That’s because we skipped the HA Supervisor which exposes UI to not only set add-ons options but also the default audio device to use. Instead of that we need to use env variables for now ;)

Edit: And yes – you need to pass your AUDIO_DEVICE as shown in above docker-compose.yaml example.

dusty-nv commented 2 months ago

OK gotcha - eventually we will have an entire section on Jetson AI Lab with easy-to-follow HomeAssistant tutorials for setting up the AI services, but for now it would be nice to have lower-level notes for ppl on the forums/ect who want to try this. Chugging through the builds now!

ms1design commented 2 months ago

That’s understandable, I’m gonna work on that on upcoming days 🙌

dusty-nv commented 2 months ago

Hitting an issue where these containers use Python 3.11, and also use tensorrt (at least the piper one does) - but tensorrt only installs python bindings for the default version of Python (i.e. 3.10). I believe I can get around this by having the tensorrt container build it's bindings from source, but I'm curious how you got around this in your builds?

ms1design commented 2 months ago

I'm curious how you got around this in your builds?

I think it fails when you run tests, right? Its not the build that fails

dusty-nv commented 2 months ago

Yea, the tests...I just disabled that test for now, because I don't believe tensorrt python module is actually used (onnxruntime links against the TensorRT C++ libs). Will revisit that later...

ms1design commented 2 months ago

I had the same feeling so tbh I skipped that and later forgot about it completely 🤓

dusty-nv commented 2 months ago

@ms1design do you see potential side-effects in your docker-compose setup if I change ENTRYPOINT ["/init"] to CMD /init ? I am working through the wyoming-piper container and that entrypoint is causing the post-build tests to fail (which does illustrate an issue with how I invoke the tests...but yea. been one of those days 🥴)

ms1design commented 2 months ago

Not tried that. What I tried was setting the entrypoints from docker-compose.yaml file, but I was facing some PID issues.

edit: @dusty-nv I had issues when using —init, thats why I explicitly set it to false in docker-compose

dusty-nv commented 2 months ago

OK, here they are!

dustynv/wyoming-openwakeword:r36.2.0
dustynv/wyoming-assist-microphone:r36.2.0
dustynv/wyoming-whisper:r36.2.0
dustynv/wyoming-piper:r36.2.0