Open WW1983 opened 2 weeks ago
For now, you can use something like the following:
There are the following areas (rooms) available:
area_id,area_name
{% for area_id in areas() %}
{% if area_id != 'temp' and area_id != 'settings' %}
{{ area_id }},{{ area_name(area_id) }}
{% endif %}
{% endfor %}
temp
and settings
are just rooms i have for items i either dont use or use this 'room' for storing configs.
I have no idea on how to extract the aliases without modifying the code, so you can also append something like this after the first code block:
These are aliases for the rooms:
1. Living Room - Wohnzimmer
2. Kitchen - Küche
3. Dining Room - Esszimmer
4. Bedroom - Schlafzimmer
5. Office - Büro
6. Maids Room - Abstellraum
7. Guest Toilet - Gäste-WC
8. Corridor - Flur
It might also help to set something like this at the beginning of the configuration to prevent the: Localized Name (Original Name) from being displayed
If item name in YOUR_LANGUAGE is available, never provide the English item name.
Thank you.
things are going a little better with your tips. But still not good. I think I'll wait a little longer. With version 2024.7, Ollama may be better integrated.
@WW1983 I am also using LocalAI-llama3-8b-function-call-v0.2
with LocalAI (latest docker tag available).
If you have a decent GPU (i am running whisper, wakeword, localai, piper in the VM with RTX 3090 Ti), this model is pretty good at following directions and provide decent output.
llama3-8b-function-call-v0.2
Thank you. Use Ollama on a Minisforum MS01 with a Tesla P4 Nvidia GPU. So it should work. Is there also a model for Ollama?
Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)
On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )
Have not really played with Ollama, but if it supports GGUF models, my guess would be that you can use this one (literally first link in google) - https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2 (or this one https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 - safetensors)
On a second thought, it might not work, you can always spin up a container with LocalAI while waiting for better Ollama support )
I'll try it. But unfortunately I have the problem that I can't pass the GPU through the docker container
@WW1983 i wont go into the details on how to configure CUDA and NVIDIA Container Toolkit (there are plenty tutorials), but i will say few things:
apt
with DKMS, no nonsense with drivers from official websitedeploy
part, but i will post the config for the whole service just for reference)
localai:
container_name: localai
image: localai/localai:latest-gpu-nvidia-cuda-12
restart: unless-stopped
environment:
TZ: Asia/Dubai
volumes:
- ./data/localai:/build/models:cached
ports:
- 8181:8080
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
whisper
container on a remote machine:
whisper:
container_name: whisper
image: rhasspy/wyoming-whisper
restart: unless-stopped
command: --model medium-int8 --language en --device cuda
environment:
TZ: Asia/Dubai
volumes:
- ./data/whisper:/data
- /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
- /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
- /usr/lib/x86_64-linux-gnu/libcublasLt.so.12:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
- /usr/lib/x86_64-linux-gnu/libcublas.so.12:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
ports:
- 10300:10300
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Thing is, that way you can still use GPU for other tasks. In my case, this VM is dedicated to the AI stuff only, so here is what i run on it with single GPU shared across all services:
And one more thing, if you want it to behave nicely, you need to enable persistence
on the GPU, otherwise you will be stuck in P0 with 100W+ of power draw, and as of now, i am using 21488 / 24576MiB
of memory with GPU in P8 state @ 38C with 28W of power needed.
Thank you. I try it now with Ubuntu 24.04. and it works. How to connect it with Home Assitat?
What selection do I have to make? "Ollama AI"?
Backend: Generic OpenAI Host: IP of machine LocalAI is running on Port: Port, which you exposed from Docker Compose file Model: LocalAI-llama3-8b-function-call-v0.2
Backend: Generic OpenAI Host: IP of machine LocalAI is running on Port: Port, which you exposed from Docker Compose file Model: LocalAI-llama3-8b-function-call-v0.2
Thank you. It works.
Where can I check these processing times?
You can do that in: Settings -> Voice Assistans -> _YOURPIPELINE -> 3 dots -> Debug
You can do that in: Settings -> Voice Assistans -> _YOURPIPELINE -> 3 dots -> Debug
Have it. But i think my System ist a litle bit slow for LocalAI
LocalAI:
Ollama: `
P4 is a little bit slow compared to 3090 Ti, if you have 175 USD to spare, you might aswell buy P40 (saw it on amazon).
BUT, given that many libraries now begin to support RT and Tensor cores, you can look at A2000 (about 40% faster but have only 6Gb of RAM which is bad), or A4000 which is roughly 3 times faster than P4, but the price is a bit too high.
Or even go ebay route and gamble on RTX 3090, yes RTX 4060 Ti is technically the same as 3090, BUT, memory is the main reason I still haven't sold mine and use it for the AI VM. I tried to use 3080 Ti, but the memory is not enough, so I just use it for VR VM.
Sadly, you won't be able to find a cheap GPU with that amount of memory. Basically, any RTX (2nd gen+) or a2000/4000 is the only way to go now if you want fast responses from llms.
I was planning to buy A6000, but it is way cheaper to buy new 3090 Ti and have two of them to achieve same 48GB of memory.
P.S. Funny how 4090 is not much faster than 3090 Ti.
P.P.S. if you would decide to buy a brand new 3090, you might aswell spend 250 USD extra and go 4090.
Thank you for all your tips. But I think that's a bit too much for me. I just wanted to build something small to experiment. That should be enough at the beginning.
hello,
not all of my rooms are recognized. If I ask a question about a device in a certain room, it switches to another room. In this case, I asked if the window in the dressing room was open. It checks the window in the bedroom.
I have al Linuxserver wiht Ollama. Model: llama3:8b
Does anyone have an idea why the rooms are not recognized?