acon96 / home-llm

A Home Assistant integration & Model to control your smart home using a Local LLM
483 stars 56 forks source link

networking issue? #131

Closed toxic0berliner closed 2 months ago

toxic0berliner commented 2 months ago

Describe the bug
Screenshot_20240427-122323_Home_Assistant

Context
I have a PvE CT running ollama on debian. Works fine using https://github.com/open-webui/open-webui both with Mistral and fixt/home-3b-v3:latest The firewall is set to allow all necessary traffic, I can curl http:10.0.10.61:11434 from within the home assistant container without issue it tells me ollama is running. Also tried the same with ollama.lan and the proper DNS record. Can't figure out what's going on...

Logs
If applicable, please upload any error or debug logs output by Home Assistant.

Cette erreur provient d'une intégration personnalisée

Enregistreur: custom_components.llama_conversation.agent
Source: custom_components/llama_conversation/agent.py:262
intégration: LLaMA Conversation (documentation)
S'est produit pour la première fois: 11:51:40 (5 occurrences)
Dernier enregistrement: 12:06:49

There was a problem talking to the backend
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 467, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 462, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/http/client.py", line 1423, in getresponse
    response.begin()
  File "/usr/local/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 469, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='ollama.lan', port=11434): Read timed out. (read timeout=90.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/config/custom_components/llama_conversation/agent.py", line 262, in async_process
    response = await self._async_generate(conversation)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llama_conversation/agent.py", line 187, in _async_generate
    return await self.hass.async_add_executor_job(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/llama_conversation/agent.py", line 1142, in _generate
    result = requests.post(
             ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='ollama.lan', port=11434): Read timed out. (read timeout=90.0)
toxic0berliner commented 2 months ago

Also tailing the ollama logs I see no request coming. During the install in the GUI it was able to request and list the available models on my ollama instance...

toxic0berliner commented 2 months ago

Also tried to remove variables from the prompt and test the prompt with my GUI the answer is faster than 90secs by a long shot. My only setting is to increase the context length to 9000 as I have many devices that I'd wish to control...

acon96 commented 2 months ago

Did you set Ollama to listen on all interfaces? By default it only allows connections from the local machine.

toxic0berliner commented 2 months ago

I think so it's the OLLAMA_HOST=0.0.0.0:11434 env var I added in the ollama systemd service, seems to work since the API is answering me Ollama is running.

I tried with very simple prompt just in case my ollama was dead slow, I increased the timeout to 180s... Nothing works so far and I don't know where to look next...

acon96 commented 2 months ago

Does the computer you are running ollama on show any CPU or GPU usage when you submit a request? Also how many entities are exposed to the voice assistant?

toxic0berliner commented 2 months ago

Asking the AI with the local gui does get me an answer and CPU usage is low but not null, journalctl -f -u ollama does show activity. I had 200+ entities exposed by ha, I brought it down to 120. But asking ha using the assistant setup with llama_conversation I see no CPU usage and nothing in the logs of ollama :( It really seems it's not sending anything to ollama but the setup for example is able to list the available models at least so networking looks fine...

acon96 commented 2 months ago

Can you set the request timeout in the integration to at least 180 seconds? (double the default) Everything I can find indicates that Ollama is taking too long to respond because you have so many entities exposed.

toxic0berliner commented 2 months ago

Yes! Found the issue! I had "use https" ticked but in fact I would need a reverse proxy to add https or I'd need to pass something to ollama to expose https... Disabling it, I saw some activity when I ask assist a question. At first it did timeout, increasing the timeout I now get responses that are all "turned on the lights, error calling {service:lights_turn-on, room=living_room} or similar, but this issue is closed ❤️ Sadly running an LLM on my AMD 5700U without dedicated GPU is real slow yes and I do have many entities to expose so won't be a useful tool for me yet, but thanks for the help and amazing tool ❤️