XKNX / xknx

XKNX - A KNX library written in Python
http://xknx.io/
MIT License
290 stars 106 forks source link

Many xknx warnings and errors in HA log #268

Closed DirkMu closed 4 years ago

DirkMu commented 4 years ago

Description of problem: I started using HA with KNX integration using the KNX platform and HA abstraction. Since I had warnings like "Could not sync group address" or "KNX bus did not respond in time to" I started using the xknx abstraction and a xknx.yaml file. (Due to the recommendation in the HA documentation

Optional, recommended for large KNX installations (>100 devices) and/or if you want to use the XKNX abstraction

I am using 216 different knx entities in HA (climates, covers, lights, switches, binary sensors and sensors)

I always had warnings but since HA release 106 and/or using xknx abstraction the log of HA fills up with many warnings and errors. As a result knx integration becomes unreliable.

What is strange is, when HA is restarted there is no issue at all. After an hour first warnings may occur. Typically a hue bridge error occurs and than the xknx errors will increase.

HA is running in a docker container. I have another container with only xknx enabled (xknx file) and no other integrations, automations, etc. In that container no warnings or errors occur.

I checked the GAs mentioned in the log but all of them are readable. In fact, there is an update of the state in HA available.

Version information:

KNX installation: KNX Installation: 3 lines, no filters (required due to the number of devices) IP Interface: Jung IPS300REG (KNX Secure not enabled)

Problem-relevant xknx.yaml or configuration.yaml entries (fill out even if it seems unimportant):

Traceback (if applicable):

2020-03-06 18:05:29 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you do experience issues with Home Assistant. 2020-03-06 18:05:50 WARNING (MainThread) [homeassistant.setup] Setup of input_boolean is taking over 10 seconds. 2020-03-06 18:05:50 WARNING (MainThread) [homeassistant.setup] Setup of input_number is taking over 10 seconds. 2020-03-06 18:19:40 WARNING (MainThread) [homeassistant.components.http.ban] Login attempt or request with invalid authentication from 46.189.28.120 2020-03-06 18:37:37 ERROR (MainThread) [homeassistant.components.hue.light] Error fetching light data: 2020-03-06 18:37:37 ERROR (MainThread) [homeassistant.components.hue.sensor_base] Error fetching sensor data: 2020-03-06 18:38:14 ERROR (MainThread) [homeassistant.components.hue.bridge] Request failed 3 times, giving up. 2020-03-06 18:38:14 ERROR (MainThread) [homeassistant.components.hue.bridge] Request failed 3 times, giving up. 2020-03-06 18:38:21 ERROR (MainThread) [homeassistant.components.hue.bridge] Request failed 3 times, giving up. 2020-03-06 18:38:21 ERROR (MainThread) [homeassistant.components.hue.bridge] Request failed 3 times, giving up. 2020-03-06 19:06:29 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 3/1/5 2020-03-06 19:06:29 WARNING (MainThread) [xknx.log] Could not sync group address '3/1/5' from <Cover name="kelbad_cover" updown="GroupAddress("3/1/1")/None//Direction.DOWN" step="GroupAddress("3/1/3")/None/None/None" position="GroupAddress("3/1/4")/GroupAddress("3/1/5")//0" angle="None/None/None/None" travel_time_down="22" travel_time_up="22" /> 2020-03-06 19:06:30 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 3/1/15 2020-03-06 19:06:30 WARNING (MainThread) [xknx.log] Could not sync group address '3/1/15' from <Cover name="gaezi_li_cover" updown="GroupAddress("3/1/11")/None//Direction.DOWN" step="GroupAddress("3/1/13")/None/None/None" position="GroupAddress("3/1/14")/GroupAddress("3/1/15")//0" angle="None/None/None/None" travel_time_down="22" travel_time_up="22" /> 2020-03-06 19:06:30 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 19:06:30 WARNING (MainThread) [xknx.log] Sending of telegram failed. Retrying a second time. 2020-03-06 19:06:31 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 3/1/25 2020-03-06 19:06:31 WARNING (MainThread) [xknx.log] Could not sync group address '3/1/25' from <Cover name="gaezi_re_cover" updown="GroupAddress("3/1/21")/None//Direction.DOWN" step="GroupAddress("3/1/23")/None/None/None" position="GroupAddress("3/1/24")/GroupAddress("3/1/25")//0" angle="None/None/None/None" travel_time_down="22" travel_time_up="22" /> 2020-03-06 19:06:31 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 19:06:31 WARNING (MainThread) [xknx.log] Resending telegram failed. Reconnecting to tunnel. 2020-03-06 19:06:31 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'Disconnect' with error in 'DisconnectResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 19:06:40 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 1/2/232 2020-03-06 19:06:49 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 1/2/132 2020-03-06 19:06:49 WARNING (MainThread) [xknx.log] Could not sync group address '1/2/132' from <Switch name="gaebad_light_aktor_switch" switch="GroupAddress("1/2/131")/GroupAddress("1/2/132")//True" /> 2020-03-06 20:07:10 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 6/5/1 2020-03-06 20:07:10 WARNING (MainThread) [xknx.log] Could not sync group address '6/5/1' from <Sensor name="ws_dach_wind_richtung" sensor="None/GroupAddress("6/5/1")//261" value="261" unit="°"/> 2020-03-06 20:07:16 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 4/2/5 2020-03-06 20:07:16 WARNING (MainThread) [xknx.log] Could not sync group address '4/2/5' from <Sensor name="wohnzi_temp" sensor="None/GroupAddress("4/2/5")//23.1" value="23.1" unit="°C"/> 2020-03-06 20:07:17 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 4/2/15 2020-03-06 20:07:17 WARNING (MainThread) [xknx.log] Could not sync group address '4/2/15' from <Sensor name="kueche_temp" sensor="None/GroupAddress("4/2/15")//22.4" value="22.4" unit="°C"/> 2020-03-06 20:07:17 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 20:07:17 WARNING (MainThread) [xknx.log] Sending of telegram failed. Retrying a second time. 2020-03-06 20:07:18 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 4/2/25 2020-03-06 20:07:18 WARNING (MainThread) [xknx.log] Could not sync group address '4/2/25' from <Sensor name="hwr_temp" sensor="None/GroupAddress("4/2/25")//21.7" value="21.7" unit="°C"/> 2020-03-06 20:07:18 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 20:07:18 WARNING (MainThread) [xknx.log] Resending telegram failed. Reconnecting to tunnel. 2020-03-06 20:07:18 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'Disconnect' with error in 'DisconnectResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:17:34 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:17:43 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:17:43 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:17:49 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:17:49 WARNING (MainThread) [xknx.log] Heartbeat failed - reconnecting 2020-03-06 20:17:49 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'Disconnect' with error in 'DisconnectResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:31:53 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:31:54 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 20:31:54 WARNING (MainThread) [xknx.log] Sending of telegram failed. Retrying a second time. 2020-03-06 20:31:55 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling' 2020-03-06 20:31:55 WARNING (MainThread) [xknx.log] Resending telegram failed. Reconnecting to tunnel. 2020-03-06 20:31:55 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'Disconnect' with error in 'DisconnectResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:34:34 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:34:34 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:34:40 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:34:40 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'ConnectionState' with error in 'ConnectionStateResponse': ErrorCode.E_CONNECTION_ID 2020-03-06 20:34:40 WARNING (MainThread) [xknx.log] Heartbeat failed - reconnecting 2020-03-06 20:34:40 WARNING (MainThread) [xknx.log] Error: KNX bus responded to request of type 'Disconnect' with error in 'DisconnectResponse': ErrorCode.E_CONNECTION_ID

DirkMu commented 4 years ago

I did further investigations. Currently I compare two containers. Both using the same HA setup and knx content and connect to the same KNX IP interface. The only difference is the number of integrations I use in the test environment.

As said above one container (productive environment) uses further integrations such as HUE, Axis, Unif, etc. The other only uses knx and for testing purposes I add one integration after another.

Since I am neither a developer nor familiar with phyton I can only share my findings based on the configuration.

I figured out that when knx integration runs on its own (test environment) there are no warnings or errors concerning knx in the HA log at all. In the same time many warnings and errors occured in the productive environment.

When I add HUE only to the test environment some warnings will occur here, too. I could not figure out which other integration impacts knx integration but from my point of view additional integration(s) will cause knx to raise warnings and errors. It is only an assumption. Hopefully, you can support here.

Julius2342 commented 4 years ago

try to slow down the sync process by setting a rate_limit ?

DirkMu commented 4 years ago

Thank you Julius. I already did that (only until 15). What makes me doubt this approach is that fact that the test environment runs without warnings and errors but the same number of knx entities exists and the same number of request is sent to the bus. Is there an explanation for this?

Julius2342 commented 4 years ago

Could it be that some packets are dropped?

The error you pasted is:

    # The KNXnet/IP Server device cannot find an active data
    # connection with the specified ID.
    E_CONNECTION_ID = 0x21
DirkMu commented 4 years ago

With dropped packages you mean the KNX Interface dropped the packaged?

The Jung interface allows up to 8 connections. Indeed the buffer is more or less fully allocated. However, the errors occur for one container only. The other container running in parallel does not throw any error at all, although the same interface is connected? According to your assumption and I get you right (packaged dropped) the same should happen on the other connection, too. Don't you agree?

In the test environment rate_limit is 20 whereas in the productive environment rate_limit is 15.

DirkMu commented 4 years ago

I hope you unterstand my point. By comparing those two containes I'm trying to narrow down the problem and I keep coming back to the conclusion that the problem may not necessarily be the interface, because one container runs without any error or even warning. I can also be wrong. No doubts.

Julius2342 commented 4 years ago

Did you try to monitor the raw packages?

logger:
  default: warning
  logs:
    xknx.telegram: debug
    xknx.knx: debug

(That at least could help debugging your problem)

DirkMu commented 4 years ago

I will create debugging logs of both containers in parallel. I will get back to you as soon as I have appropriate logs.

Julius2342 commented 4 years ago

This line here is interesting:

2020-03-06 19:06:30 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'Tunnelling'

It looks like HASS was not able to establish any tunneling connection at all.

DirkMu commented 4 years ago

From your experience what can be the reason for that?

Julius2342 commented 4 years ago

hass and router not being on same network?

DirkMu commented 4 years ago

I wonder why those errors occur far more often when HUE integration is enabled. Is there any correlation?

Btw. I am using Home Assistant Core running in Docker on a synology nas (both mentioned containers). And yes, the nas runs in a different network (192.168.1.0/24) as the interface (192.168.11.0/24) but there is nothing that prevents communication from one network to the other.

From my point of view we should keep in mind the other container runs without any errors.

DirkMu commented 4 years ago

When debugging is enabled will there be the same warnings and errors in the log?

Such as:

DirkMu commented 4 years ago

Today, the number of warnings and error were less than before. Anyway, there are a few warnings. What I am supposed to be looking for in the log? What I see is that both test and productive environment frequently switch the tunnel from 1.1.211 to 1.1.212 and back.

Whereas in the production environment several warnings occurred in the test environment 2 warnings occurred. I attached some screenshots including that stats of the interface.

The log is around 38 MB.

HA Log 1 HA Log 2 KNXIPInterface

Julius2342 commented 4 years ago

So it looks like it "generally" works. You the tunnel-connection just breaks under certain conditions (and after that, all subsequent read-requests fail).

You should reduce the rate limit to a very low number and check if all is working (with delays!) and then increas the limit slowly.

Julius2342 commented 4 years ago

another comment. KNX is udp based, "Did not respond in time" means that the response telegram did not reach the client. As KNX tunneling has some counters, it could be the whole tunnel connection is broken by then and needs to reconnect (what XKNX does underneath, you can see this in the logs.)

Julius2342 commented 4 years ago

Btw, do you run both HASS instances at the same time? If so, did you configure a different device-id for the second one?

DirkMu commented 4 years ago

So it looks like it "generally" works. You the tunnel-connection just breaks under certain conditions (and after that, all subsequent read-requests fail).

You should reduce the rate limit to a very low number and check if all is working (with delays!) and then increas the limit slowly.

Thanks Julius. I reduced the rate limit to 10 and do still observe it. Do you want to see the corresponsing logs? Those are around 38 MB.

DirkMu commented 4 years ago

another comment. KNX is udp based, "Did not respond in time" means that the response telegram did not reach the client. As KNX tunneling has some counters, it could be the whole tunnel connection is broken by then and needs to reconnect (what XKNX does underneath, you can see this in the logs.)

I moved the Interface to the network of the containers in the meantime.

DirkMu commented 4 years ago

Btw, do you run both HASS instances at the same time? If so, did you configure a different device-id for the second one?

Good point. Both container use their own xknx.yaml file but with the same content. Connection settings are set in the configuration file of home assistant. Own address is not used since it did not work. Thus, the name of the ids are the same. Does this have an impact?

Today I only observed one warning so far but as observed before HUE errors occurred right before.

20200312 HA log

Julius2342 commented 4 years ago

May you disable syncing and check if switching on/off works? (It really looks like a networking problem).

DirkMu commented 4 years ago

by disabling syncing you mean state_updater = fales, right? Switching on / off means I shall trigger an update from the BUS?

Since my last post here, more and more warnings occurred. Attached the log. Please ignore the Neato and DenonAVR entries. Denon is powered off. home-assistant (1).log 20200312 HA log (2)

DirkMu commented 4 years ago

Just an update from my KNXIP Interface. There is no noticeable load on the interface. What is noticeable is the TX tunnel re-requests (number of telegrams, that needed to be repeated in the tunnel connection).

20200312 KNXIPInterface

DirkMu commented 4 years ago

Hey @Julius2342 ,

Concerning your request to disable syncing. What exactly do you mean?

state_updater = false (my understanding is that in that case no state will be read from the BUS) or sync_state = false for individual devices? Is this also possible for switches, lights, covers, and HVACs?

Currently I am using state_update = false but HA still seems to attempt to read the GAs from the BUS hourly. Is this behavior correct?

The issue unfortunately still persists.

DirkMu commented 4 years ago

Hi @Julius2342

I observed the behavior over the past days, attempting different configurations. I still have the issue as described above but I was able to reduce the number of warnings/ errors. However, although there is no load on the interface (max. 25%) the issue occurs.

I have a few questions concerning the state_updater. If set to true xknx, respectively HA, will collect the states of all group addresses that will provide the state of the device / object. Except the sensors and binary sensors that have sync_state = false.

Currently I have around 200 GAs that will be synced.

Since the buffer on the interface is not fully used (25%), I assume there must be something additionally that runs into an issue.

Assuming the 200 GA need to be synced by the state_updater and rate_limit is 10 (per second). It will take around 20 seconds, right? How is this done? As I understood from the explanation in #120 this is queued and processed in packages of 10?

How does the callback function work? As far as I understand the function for each telegram received from the bus a callback is done to the bus, right? Is this callback queued also?

Can the logger be extended to deliver the current queue size as mentioned by @farmio in #120?

Is it technically possible that such a queue runs over and that in the case the warning ('KNX bus did not respond in time' --> 'Could not sync group address') is raised?

farmio commented 4 years ago

Hey! There are some efforts to change how the state_updater works (see #227) but I wouldn't hold my breath ;)

I don't think the queue will overflow with 200 items. Only the outgoing telegrams are delayed, incoming telegrams are processed immediately.

As you stated your Hue integration also causes troubles, have you tried to disable hue to see if Knx still breaks then?

DirkMu commented 4 years ago

Hey farmio! Thanks for responding.

On Discord you mentioned you are going to create a new release and afterwards file a PR to get it into home-assistant. When do you plan to do this? Will this also include the possibility to see the queue and the mentioned PR?

Concerning the queue. Incoming telegram are processed immediately but what about the callback that is documented on xknx.io?

Concerning HUE. Yes, I disabled HUE and the amount of warnings decreased but there were still warnings. I also disabled Netatmo which had a significant higher impact on the amounts of warnings. Don't know if this makes sense or whether there is a relation.

On Home Assistant it is said to use xknx abstraction for large installations (>100 devices). Is this still valid?

As described earlier I have a tested the behavior on my test environment where no issues occurred at all. The only difference is

What I am struggling to understand. Why is a warning raised when there is no load on the interface. The only reason I see for such a warning is the interface could not process the read request(s).

Thanks Dirk

farmio commented 4 years ago

I doubt the state_updater stuff will be finished anytime soon. Certainly not in the next release.

I don't think initializing with a xknx.yaml makes any difference at this state. These config schemes are just applied at startup.

Im not sure what you mean by callbacks documented in xknx.io (maybe we can work it out on discord soon).

After all this still sounds like a network problem to me. For this warning there is no way for us to tell why there was no response - we can't even be sure if the Request reached the bus - you would have to monitor with ETS (or maybe wireshark).

DirkMu commented 4 years ago

Thanks. I will observe it further.

Callback: https://xknx.io/xknx#callbacks

When you say network problem, what kind of problem do you think of? There is no much data send and received. Both components, docker and KNX IP Interface are in the same network. The docker runs on a NAS (Synology 918+) which provides two LAN ports. LAG was enabled but for testing purposes I disabled LAG and separated the traffic of the Axis cams (LAN2) from the traffic of the other devices such as the KNX IP interface (LAN1). For LAN1 traffic control is enabled and there is a guaranteed bandwidth for the KNX IP interface, whereas the interface itself just sends a few KB/s.

I will enable the ETS monitor. Lets see...

DirkMu commented 4 years ago

What I see in the monitor. xknx requests the HVAC operation mode states twice for each device, whereas other states are requested once.

Is this an expected behavior?

farmio commented 4 years ago

Have you maybe configured one mode address for multiple devices?

DirkMu commented 4 years ago

No. this happens for alle HVAC mode addresses. However, this does not raise warning in the log always.

image

DirkMu commented 4 years ago

hey @farmio and @Julius2342

Again me.

More and more I see that the error occurs for state updates coming from BUS (?).

For the sensor below sync_state = false. Thus, I assume the value is send from the BUS and the error occurs in HA. You also see that a value for the GA exists in the log.

2020-04-06 09:30:54 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to request of type 'ConnectionState'
2020-04-06 09:35:34 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 6/5/4
2020-04-06 09:35:34 WARNING (MainThread) [xknx.log] Could not sync group address '6/5/4' from <Sensor name="ws_dach_hell_nord" sensor="None/GroupAddress("6/5/4")/<DPTArray value="[0x4e,0x1a]" />/7997.44" value="7997.44" unit="lx"/>

Another example for a binary sensor (sync_state = true). Today 11:36 BUS send an off state for that entity from the weather station, which was not processed in HA! As a result the state was still on until the state updater picked the right value from the BUS.

2020-04-06 11:36:40 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 6/5/28
2020-04-06 11:36:40 WARNING (MainThread) [xknx.log] Could not sync group address '6/5/28' from <BinarySensor group_address_state="GroupAddress("6/5/28")" name="ws_dach_behaglich" state="BinarySensorState.OFF"/>

image

At 11:36 the value should have been off!

Thus, I assume this (also or primarily) happens for incoming telegrams. What's your thought?

Please let me know if I'm completely wrong.

In addition, the error occurs no matter how low the value of the rate_limit is set in the configuration. In addition, I disabled sync_state for almost all of my sensors.

What I am seeing is a significant higher number of errors when many messages are received from the BUS. Maybe at the same time state_updater requests the current state of the entities(?). I am using a weather station that send updates frequently (1-3% change of the value and 30-60 sec.)

As you can see here, there are much more states send from the BUS to IP:

image

I also saw your PR #276 You mentioning the following:

Incoming telegrams are processed immediately. Only outgoing telegrams have rate_limit not applied. To achieve this a second Queue is started (xknx.telegram_queue.outgoing_queue).

Is there a relation to my issue?

farmio commented 4 years ago

Maybe, but I wouldn't think so. You said you have 3 lines - could you explain on your topology? Have you configured a dummy application in ETS? Also a Screenshot of an ETS Group monitor when sich an error happened could help.

DirkMu commented 4 years ago

We have a separated line for each, our separate apartment and main house (two line couplers). In addition there is a wireless line (one media coupler).

image

image

image

I don't use a dummy application. There are old devices and imported applications for testing purposes but no dummy application as such.

I am currently running ETS monitor and will update the issue once I have something useful.

Do you have an idea why the update of a sensor for which sync_state is false fails, when the current state is send from the BUS to HA? Also why the number of errors seem to increase when the weather station starts sending updates more frequently. For instance, over night no issues occur. When sun rises 6 sensors will provide the Lux value. In addition elevation and azimuth will change more frequently. As well as wind typically changes more frequently.

One thing I am still wondering. I did not have the number of errors before HA release 106.

Another thing I do not yet understand. You and julius talking about network issues. What kind of issues do you think of?

DirkMu commented 4 years ago
2020-04-06 15:39:23 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 1/1/34
2020-04-06 15:39:23 WARNING (MainThread) [xknx.log] Could not sync group address '1/1/34' from <Light name="gaezi_light" switch="GroupAddress("1/1/31")/GroupAddress("1/1/32")/<DPTBinary value="0" />/False" brightness="GroupAddress("1/1/35")/GroupAddress("1/1/34")/<DPTArray value="[0x0]" />/0" />
2020-04-06 15:39:32 WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time to GroupValueRead request for: 6/5/4
2020-04-06 15:39:32 WARNING (MainThread) [xknx.log] Could not sync group address '6/5/4' from <Sensor name="ws_dach_hell_nord" sensor="None/GroupAddress("6/5/4")/<DPTArray value="[0x4f,0xa1]" />/9999.36" value="9999.36" unit="lx"/>

Is there a chance to send you the extract of the monitor of the past hour? I would like to avoid publishing it here.

francescocam commented 4 years ago

Hi, I noticed that I probably have the same problem. I did not notice it before because I have only 5 knx devices (DALI lights) and I suspect it got considerably worse in 108. Given that I have only 5 devices and a very simple topology please let me know if I might be able to help narrow down the problem. This is a sample of my HA log:

Error: KNX bus responded to request of type 'Connect' with error in 'ConnectResponse': ErrorCode.E_NO_MORE_CONNECTIONS
8:05:17 AM – __main__.py (WARNING) - message first occurred at April 15, 2020, 10:13:51 PM and shows up 1310782 times
Error doing job: Task exception was never retrieved
8:05:13 AM – /home/hass/venv_hass/lib/python3.7/site-packages/homeassistant/core.py (ERROR) - message first occurred at April 15, 2020, 10:35:41 PM and shows up 36587 times
Error doing job: Task exception was never retrieved
8:05:11 AM – /home/hass/venv_hass/lib/python3.7/site-packages/homeassistant/core.py (ERROR) - message first occurred at April 15, 2020, 10:27:05 PM and shows up 55242 times
<CouldNotParseKNXIP description="KNXIP data has wrong length" />
5:45:55 AM – /home/hass/venv_hass/lib/python3.7/site-packages/xknx/io/udp_client.py (ERROR)
DirkMu commented 4 years ago

Hello Francesco,

Thanks for supporting. I still face my issue(s) whereas I fear your issue is a little different.

In my case connection is established and working. No noticable load on the KNXIP Interface. Nevertheless the log says HA could not read the value of a group address or snyc the value of a group address. Although the log shows a value for the group address concerned. This happens for entities that are not supposed to be synced from HA but should only receive the updated value form the bus.

With 0.108.X I have the feeling the numer of warnings decreased a little but I still do not understand why HA, respectively xknx, raises a warning for an entity that is not suppossed to be synced.

From my understanding of your log excerpt, you are facing connection issues and a parsing issue. The log states there is no connection availble. xknx seems to attempt to re-connect which fails (1310782 times?). The second log states something about a parsing issue related to an incorrect lenght of the data. You configured lights only, right? Have you been able to see any state of the lights in HA, respectively to control the light from HA at all?

francescocam commented 4 years ago

@DirkMu Thanks for pointing me in the right direction! I am investigating whether my DALI gateway is faulty.

farmio commented 4 years ago

@francescocam E_NO_MORE_Connections would indicate your Interface has no free connection slots. Maybe some other device occupies it. @DirkMu there have been no changes for quite some time in the Knx Integration so it makes me wonder if or why it has gotten better in 0.108. What is the source address for telegrams from HA when you look at it in ETS? It should be on the line where your interface is (I assume 1.1.x). If not you can set the address by using xknx.yaml. I'd create a dummy in ETS on the same line and assign every group address to it that is used in HA. Then program your line couplers (updates its filter tables). I also noticed that you use Knx-rf. This has led to complications with xknx before. It should be resolved in the next version but we have no way to reproduce the error so I'm just hoping the fix works.

DirkMu commented 4 years ago

@farmio I know xknx was changed for quite a while. However, I always had the feeling there is something in HA under the hood which has an impact on multicasts and thus on different components. Maybe you remember when I raised the issue I said with 0.106 it became worse and I also stated I have the feeling hue and netatmo have an impact on the xknx component. Anyway, the hue and netatmo issue alongside the xknx issue made me dig a little deeper and fortunately in this HUE related Github issue https://github.com/home-assistant/core/issues/32380#issuecomment-614698319 somebody said there might be an issue with Unifi and how it handles multicast.

The mDNS-Reflector (for USG/UniFi available from the GUI as "mDNS Service" will reflect mDNS traffic on ALL interfaces (incl. WAN!) but the interface the traffic originates from. This caused my HUE bridge to be unavailable and I assume also spamming the KNXIP Interface. Since I disabled the mDNS reflector the number of xknx warnings significantly reduced. There is a warning on startup and maybe another during a day (although there is still no load on the assigned tunnel (12%)). In addition my hue issue is gone.

Concerning your questions. the source address is always one of the addresses the interface assigns HA during startup. Since I have 8 tunnels it can be one of the addresses between 1.1.211 and 1.1.218. Sometimes HA gets two tunnels assigned in parallel which I find strange. However, when I set an address in the xknx.yaml the connection will not be established. An appropriate error is raised in the HA log during startup.

Honestly, I did not get your hint with the dummy in ETS. What I the purpose of such a dummy? Currently I do not filter on the line couples. Is this what you want to achieve?

Yes, I am using KNX-RF and I also following #167 and #262. I agree with the use case mention there. By using my RF-switch I would like to control my HUE bulbs. That Is why I need support for DPT 3.007 and 3.008. Currently, this is done via the Hager IoT Controller which I would like to get rid of. So I eagerly awaiting a new release of xknx that includes support for those DPT types. Currently the KNX-RF devices do not cause any issues in HA.

I also would like to see the improvements of you both changes: #276 and #227.

I still think a differentiation of the state_updater between system startup and during runtime would make sense, since I don't need HA to update many of my binaries frequently but reading the current state during startup only.

farmio commented 4 years ago

Yes the dummy device is solely for filling the filter tables in the line-couplers, mediacouplers and ip routers.

Glad to hear that your network issues are resolved now.