Closed mousetwentytwo closed 6 months ago
Same problem here, after upgrade to 23.2.1
The same here
Upd: it seems after --interval=5m
the container goes to unhealthy state and then HA suppose it as running (watchdog is off)
Same problem here.
And when you have the watchdog on then you have a reboot every 15 minutes.
Unfortunately I can not work on the code until about about mid August. That said two thoughts:
thank you for raising this issue!
Hello, Some words about the current health check: The healthcheck is introduced with #54 as seen here
docker containers has no explicit "starting" state. It has 'created' and 'running' states. in our case we have running state:
$ docker inspect -f '{{.State.Status}}' addon_12341234_ebusd
running
The problem appears first, when the container starts and there is no proper response for curl command on http://127.0.0.1:8888 after 5 minutes as desribed here: https://github.com/LukasGrebe/ha-addons/blob/5dd56311f043f9238f1a3895d40f9365dd0eed21/ebusd/Dockerfile#L19C1-L21C50
I assume that on port 8888 the ebusd is running and it accepts only http0.9 requests (because others are fail).
So, after entering into the container with docker exec -it addon_12341234_ebusd /bin/bash
you can easily check the curl command:
$curl --fail http://127.0.0.1:8888
curl: (1) Received HTTP/0.9 when not allowed
after narrow down the http request version you will get another error and it hangs by curl:
curl --http0.9 --fail-with-body http://127.0.0.1:8888
ERR: command not found
(additionally, You can eliminate the hang with '--max-time 1' parameter but it does not solve the problem.)
Anyway, the ultimate goal should be any non-error (200-OK) response from ebusd via http. I've stucked here. - I cannot get any prompt info from the daemon neither on TCP client (8888) nor on http client(8889) after authentication. So I think this (correct) direction is a dead end, more over these two ports are user configurable... - I'm assume that we are not able to check the health of the ebusd service via http requests. As a workaround we are able to check the status/availability of the container if we use another service. I would recommend an additional lightweight http service (Lighttpd or nginx) where we can curl/wget a dummy HTTP-200 answer on localhost on another port, or be more simple: a dummy shell script which always returns 0 (https://docs.docker.com/engine/reference/builder/#healthcheck)...
Additionally, don't forget, that the current image contains the version of curl 8.1.2. with several CVE-s, so it should be updated at least to version of 8.2.1 as soon as possible....
I cannot get any prompt info from the daemon neither on TCP client (8888) nor on http client(8889) after authentication
For TCP try echo "INFO" | nc localhost 8888
version: ebusd 23.2.p20230716
update check: revision 23.2 available
device: 192.168.88.112:9999
signal: acquired
symbol rate: 23
max symbol rate: 96
min arbitration micros: 2
max arbitration micros: 49
min symbol latency: 5
max symbol latency: 57
scan: finished
... <cropped>...
For HTTP it's curl http://localhost:8889/datatypes
{"type": "BCD", "isbits": false, "isadjustable": false, "isignored": false, "isreverse": false, "length": 1, "result": "number"},
{"type": "BCD:2", "isbits": false, "isadjustable": false, "isignored": false, "isreverse": false, "length": 2, "result": "number"}
... <cropped>...
I believe all we need it's change HEALTHCHECK to curl --fail http://127.0.0.1:8889/datatypes || exit 1
to prove that ebusd is still alive, but the --httpport=8889
is mandatory in such case which is present by default, but user is able to remove it and thus corrupt the healtcheck.
The other way is check using TCP way, but I'm not sure what should indicate the daemon healthiness (the "signal" status?)
Unfortunately I'm not familiar with HA addons, so I don't know how to test both approaches
Well,
according to @ech0-py suggestion, the healthcheck can be done by nc
as well (instead of curl). My proposal based on the suggestion is:
HEALTHCHECK --interval=5m --timeout=3s \
CMD nc -z localhost 8888 || exit 1
I've not tried it, but it should work. In this case port 8889 is not necessary.
@mousetwentytwo could you check if the problems persist post merge of @cociweb's fix?
It's still there: the fix does not change anything as port 8888 is only enabled when the option to expose the http server is set.
23-09-24 21:16:13 WARNING (MainThread) [supervisor.addons.addon] Timeout while waiting for addon eBUSd to start, took more then 120 seconds
@tjorim, Have you tried to restart the supervisor?
the fix solved for me and it is healthy for hours now:
since the healthcheck is inside the docker container, there is no need to expose any ports.
My addon also seems to be healthy from HA as well. - It's worth to restart Supervisor&Ha-Core
If the Supervisor restart does not resolve your problem, maybe your supervisor tries to reach a dead/renamed docker container.. In this case, please, try to reinstall your addon - maybe something messed up for you. (As mentioned above, by default 8888 is used for tcp service and http service is optional and by default it uses 8889. as tcp service runs always, the container NetCats it's localhost, so no need any further network config than the defaults)
For me it works.
But you must restart your system or the supervisor.
Thanks for the work.
Yep, fix work, but consider that you should wait for 5 minutes until container becomes alive according to HEALTHCHECK --interval=5m
, until then you'll see "starting" status and spinner in UI
@ech0-py should we reduce the interval to say 10s or close this ticket as resolved?
Well, I've also faced this 5min stuff today. In the next PR we can add a function where the first query issued after the first 90 secs. (In my opinion at least 1 min is required to start it up on slower environments at least after fresh install...) My recommendation is to keep the 5min as default interval.
Add-on may appear stuck in starting state. Watchdog is advised to be turned off in this case.
It looks like the healthcheck is introduced for port 8888 hardcoded with a http curl call. Altough if HTTP service is enabled it starts on 8889, and by default it has a TCP service n 8888.
Related: Originally posted by @mousetwentytwo in https://github.com/LukasGrebe/ha-addons/issues/60#issuecomment-1637310926
Healthcheck code: https://github.com/LukasGrebe/ha-addons/blob/5dd56311f043f9238f1a3895d40f9365dd0eed21/ebusd/Dockerfile#L21
Not sure for the cause, may be unrelated to HTTP.