acon96 / home-llm

A Home Assistant integration & Model to control your smart home using a Local LLM
484 stars 56 forks source link

Not all rooms are recognized #175

Open WW1983 opened 2 weeks ago

WW1983 commented 2 weeks ago

hello,

not all of my rooms are recognized. If I ask a question about a device in a certain room, it switches to another room. In this case, I asked if the window in the dressing room was open. It checks the window in the bedroom.

I have al Linuxserver wiht Ollama. Model: llama3:8b

Screenshot 2024-06-17 170205 Screenshot 2024-06-17 170259 Screenshot 2024-06-17 170349

Does anyone have an idea why the rooms are not recognized?

darki73 commented 2 weeks ago

For now, you can use something like the following:

There are the following areas (rooms) available:
area_id,area_name
{% for area_id in areas() %}
{% if area_id != 'temp' and area_id != 'settings' %}
{{ area_id }},{{ area_name(area_id) }}
{% endif %}
{% endfor %}

temp and settings are just rooms i have for items i either dont use or use this 'room' for storing configs.

I have no idea on how to extract the aliases without modifying the code, so you can also append something like this after the first code block:

These are aliases for the rooms:
1. Living Room - Wohnzimmer
2. Kitchen - Küche
3. Dining Room - Esszimmer
4. Bedroom - Schlafzimmer
5. Office - Büro
6. Maids Room - Abstellraum
7. Guest Toilet - Gäste-WC
8. Corridor - Flur

It might also help to set something like this at the beginning of the configuration to prevent the: Localized Name (Original Name) from being displayed

If item name in YOUR_LANGUAGE is available, never provide the English item name.
WW1983 commented 2 weeks ago

Thank you.

things are going a little better with your tips. But still not good. I think I'll wait a little longer. With version 2024.7, Ollama may be better integrated.

darki73 commented 2 weeks ago

@WW1983 I am also using LocalAI-llama3-8b-function-call-v0.2 with LocalAI (latest docker tag available). If you have a decent GPU (i am running whisper, wakeword, localai, piper in the VM with RTX 3090 Ti), this model is pretty good at following directions and provide decent output.

image

WW1983 commented 2 weeks ago

llama3-8b-function-call-v0.2

Thank you. Use Ollama on a Minisforum MS01 with a Tesla P4 Nvidia GPU. So it should work. Is there also a model for Ollama?

darki73 commented 2 weeks ago

Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)

On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )

WW1983 commented 2 weeks ago

Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)

On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )

I'll try it. But unfortunately I have the problem that I can't pass the GPU through the docker container

darki73 commented 1 week ago

@WW1983 i wont go into the details on how to configure CUDA and NVIDIA Container Toolkit (there are plenty tutorials), but i will say few things:

  1. I am using Ubuntu 24.04 and drivers for GPU supplied through the apt with DKMS, no nonsense with drivers from official website
  2. There is a point in the NVIDIA Container Toolkit installation process where you will have to install the docker configuration, dont miss that out
  3. If we are talking about LocalAI (or any container that might need a GPU), here is how you can configure it (look at the deploy part, but i will post the config for the whole service just for reference)
    localai:
    container_name: localai
    image: localai/localai:latest-gpu-nvidia-cuda-12
    restart: unless-stopped
    environment:
      TZ: Asia/Dubai
    volumes:
      - ./data/localai:/build/models:cached
    ports:
      - 8181:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
  4. While most of the containers will run just fine with the provided config, some might require extra configuration, yet again, i will not go into the detail on how to get to the point of making it work (installing all necessary libraries), but here is the example on how to start whisper container on a remote machine:
    whisper:
    container_name: whisper
    image: rhasspy/wyoming-whisper
    restart: unless-stopped
    command: --model medium-int8 --language en --device cuda
    environment:
      TZ: Asia/Dubai
    volumes:
      - ./data/whisper:/data
      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcublasLt.so.12:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
      - /usr/lib/x86_64-linux-gnu/libcublas.so.12:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
    ports:
      - 10300:10300
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Thing is, that way you can still use GPU for other tasks. In my case, this VM is dedicated to the AI stuff only, so here is what i run on it with single GPU shared across all services:

  1. Whisper (Docker)
  2. Piper (Docker)
  3. WakeWord (Docker)
  4. LocalAI (Docker)
  5. Stable Diffusion (SystemD service)
  6. Text Generation Web UI (SystemD service)
  7. Jupyter (SystemD service)

And one more thing, if you want it to behave nicely, you need to enable persistence on the GPU, otherwise you will be stuck in P0 with 100W+ of power draw, and as of now, i am using 21488 / 24576MiB of memory with GPU in P8 state @ 38C with 28W of power needed.

WW1983 commented 1 week ago

Thank you. I try it now with Ubuntu 24.04. and it works. How to connect it with Home Assitat?

What selection do I have to make? "Ollama AI"?

darki73 commented 1 week ago

Backend: Generic OpenAI Host: IP of machine LocalAI is running on Port: Port, which you exposed from Docker Compose file Model: LocalAI-llama3-8b-function-call-v0.2

WW1983 commented 1 week ago

Backend: Generic OpenAI Host: IP of machine LocalAI is running on Port: Port, which you exposed from Docker Compose file Model: LocalAI-llama3-8b-function-call-v0.2

Thank you. It works.

Where can I check these processing times?

image

darki73 commented 1 week ago

You can do that in: Settings -> Voice Assistans -> _YOURPIPELINE -> 3 dots -> Debug

WW1983 commented 1 week ago

You can do that in: Settings -> Voice Assistans -> _YOURPIPELINE -> 3 dots -> Debug

Have it. But i think my System ist a litle bit slow for LocalAI

LocalAI: Screenshot 2024-06-21 202855

Ollama: ` Screenshot 2024-06-21 202913

darki73 commented 1 week ago

P4 is a little bit slow compared to 3090 Ti, if you have 175 USD to spare, you might aswell buy P40 (saw it on amazon).

BUT, given that many libraries now begin to support RT and Tensor cores, you can look at A2000 (about 40% faster but have only 6Gb of RAM which is bad), or A4000 which is roughly 3 times faster than P4, but the price is a bit too high.

Or even go ebay route and gamble on RTX 3090, yes RTX 4060 Ti is technically the same as 3090, BUT, memory is the main reason I still haven't sold mine and use it for the AI VM. I tried to use 3080 Ti, but the memory is not enough, so I just use it for VR VM.

Sadly, you won't be able to find a cheap GPU with that amount of memory. Basically, any RTX (2nd gen+) or a2000/4000 is the only way to go now if you want fast responses from llms.

I was planning to buy A6000, but it is way cheaper to buy new 3090 Ti and have two of them to achieve same 48GB of memory.

P.S. Funny how 4090 is not much faster than 3090 Ti.

P.P.S. if you would decide to buy a brand new 3090, you might aswell spend 250 USD extra and go 4090.

WW1983 commented 1 week ago

Thank you for all your tips. But I think that's a bit too much for me. I just wanted to build something small to experiment. That should be enough at the beginning.