brutella / hkknx-public

hkknx is a HomeKit KNX bridge for KNX.
https://hochgatterer.me/hkknx
104 stars 6 forks source link

Scene and automation with 4+ devices losing state #288

Closed gson closed 10 months ago

gson commented 12 months ago

Hi,

Thanks for a nice product, most things have worked well.

Although, I have a similar problem as described below from a homebridge user (using same ip interface as well)

When I add a scene or automation with several devices it randomly drop the turn off or on commands for some lights. In the home app it looks like the scene is executed, but lamps are not in the right state. Haven’t checked the traffic on KNX bus yet, want to see if it is some known issue.

https://github.com/snowdd1/homebridge-knx/issues/73

gson commented 12 months ago

Did a quick test adding 3 lamps to a scenario. When activated the home app correctly turn the 3 lamps on, but only 2 telegrams are sent from hkknx and only 2 lamps are lit.

gson commented 12 months ago

Another reference with similar issue https://github.com/knxd/knxd/issues/153

not sure if knxd is used under the hood?

brutella commented 11 months ago

Does your KNX gateway support TCP and have you tried to enable TCP tunnelling in hkknx under Settings → KNX Gateway → Protocol?

gson commented 11 months ago

It is an interface supporting tcp tunneling, have everything default on hkknx. Will look at the settings later today. The connection to the bus works, but losing telegrams and updates state even if status telegram is not sent.

brutella commented 11 months ago

When do you loose telegrams? When running a scene in HomeKit?

gson commented 11 months ago

Yes a home kit scene (or automation) with more than 2 devices it randomly drops telegrams. Tried TCP, unfortunately only KNXnet/IPTunneling over UDP supported

brutella commented 11 months ago

So you mean that when you execute the scene, the telegrams are not sent to KNX? Or are the telegrams sent but the KNX devices do not process them correctly?

gson commented 11 months ago

Can’t be 100% sure, but looking at the group monitor it looks like in a scene with 3 lamps only 2 telegrams are sent (with corresponding 2 status telegrams) It is random and very seldom 3 telegrams are sent and the scene is correctly activated. It is not the same lamp that have problem every time. If I add more lamps to a scene I get more failures

Even if no status telegram is sent the state in HomeKit is updated as they would have been sent. (For all three lamps)

The problem is very similar to the problem in the links above.

brutella commented 11 months ago

Can’t be 100% sure, but looking at the group monitor it looks like in a scene with 3 lamps only 2 telegrams are sent (with corresponding 2 status telegrams) It is random and very seldom 3 telegrams are sent and the scene is correctly activated. It is not the same lamp that have problem every time. If I add more lamps to a scene I get more failures

Can you confirm that all 3 lights are actually labeled as off in HomeKit? It might be the case that hkknx assumes that one of the lights is already turned on and therefore doesn't send a telegram. This happens when the status group addresses are not entered correctly in hkknx.

Even if no status telegram is sent the state in HomeKit is updated as they would have been sent. (For all three lamps)

Yes, that's on purpose. hkknx assumes that when you send a telegram to turn on a light, the light is actually turned on. It doesn't wait for the status telegram to arrive. If a status telegram arrives signalling that the light is not turned on, hkknx will label the light as off.

gson commented 11 months ago

Hi,

Did a tcp dump and the software is correctly sending the three upd packages corresponding to the three lamps, but only 2 reaches the knx bus. My guess is that the Weinziel KNX IP Interface is messed up.

When using knxd apparently the option --no-tunnel-client-queuing is required for the Weinziel KNX IP Interface 730 (I have 731, but guess the same applies)

--no-tunnel-client-queuing do not assume KNXnet/IP Tunneling bus interface can handle parallel cEMI

Any way to get hkknx to provide similar flag?

image
21:25:59.375378 IP 192.168.88.148.37801 > 192.168.88.149.3671: UDP, length 21
    0x0000:  4500 0031 d4eb 4000 4011 3356 c0a8 5894  E..1..@.@.3V..X.
    0x0010:  c0a8 5895 93a9 0e57 001d 32a9 0610 0420  ..X....W..2.....
    0x0020:  0015 0405 aa00 1100 bce0 0000 1250 0100  .............P..
    0x0030:  81                                       .
21:25:59.380287 IP 192.168.88.148.37801 > 192.168.88.149.3671: UDP, length 21
    0x0000:  4500 0031 d4ec 4000 4011 3355 c0a8 5894  E..1..@.@.3U..X.
    0x0010:  c0a8 5895 93a9 0e57 001d 32a9 0610 0420  ..X....W..2.....
    0x0020:  0015 0405 ab00 1100 bce0 0000 125f 0100  ............._..
    0x0030:  81                                       .
21:25:59.382118 IP 192.168.88.148.37801 > 192.168.88.149.3671: UDP, length 21
    0x0000:  4500 0031 d4ed 4000 4011 3354 c0a8 5894  E..1..@.@.3T..X.
    0x0010:  c0a8 5895 93a9 0e57 001d 32a9 0610 0420  ..X....W..2.....
    0x0020:  0015 0405 ac00 1100 bce0 0000 1255 0100  .............U..
    0x0030:  81 

Corresponding to:

header_length                   0x6
version                        0x10
service_type_descriptor       0x420
frame_length                   0x15
dest_addr_group              2/2/80
channel_id                      0x5
sequence_counter               0x9b
data_service                   0x11
data                            0x1
data_size                       0x1
apci                            0x2

header_length                   0x6
version                        0x10
service_type_descriptor       0x420
frame_length                   0x15
dest_addr_group              2/2/85
channel_id                      0x5
sequence_counter               0x9c
data_service                   0x11
data                            0x1
data_size                       0x1
apci                            0x2

header_length                   0x6
version                        0x10
service_type_descriptor       0x420
frame_length                   0x15
dest_addr_group              2/2/95
channel_id                      0x5
sequence_counter               0x9d
data_service                   0x11
data                            0x1
data_size                       0x1
apci                            0x2
gson commented 11 months ago

Checked the knxd code and no-tunnel-client-queuing is not more advanced than a 30ms send delay on the command queue

  {
    "no-tunnel-client-queuing", OPT_BACK_TUNNEL_NOQUEUE, 0, 0,
    "wait 30msec between transmitting packets. Obsolete, please use --send-delay=30"
  },
brutella commented 11 months ago

hkknx actually does queue outgoing telegrams because you cannot send multiple telegrams at once. You have to wait for a response from the gateway until you can send the next one.

A time of some milliseconds between telegrams seems reasonable for me.

gson commented 11 months ago

Sounds good, happy to test if/when you have time to add. Would be great if you can configure send delay similar to knxd --send-delay=30

sdonati8484 commented 10 months ago

Same issue, and I can't connecto to knx ip bridge via TCP unfortunately. Can we do something about this?

brutella commented 10 months ago

There is now version 2.8.0-b3, which waits 20ms between outgoing messages to address this issue. Please check if setting a scene via HomeKit now works more reliably.

sdonati8484 commented 10 months ago

Seems fixed, thank you for the prompt update. In the weekend I'll test more and report back

gson commented 10 months ago

Tested with 2.8.0-b5

Test scenario with 3 lamps that had a failure rate of 100% before and now I got less then 20% failures, so a big improvement but not totally fixed. Maybe should try with 30ms as the default as used by knxd?

Added an other scenario with 5 lamps and got 100% success rate and went back to 3 lamps and got 95% success rate, so not consistent error rates (for the few tries I made, approximate 20 activations of the scenario) 🤔 only thing is that more lamps doesn't make it worse at least

gson commented 10 months ago

@sdonati8484 what brand of interface ar you using?

gson commented 10 months ago

@brutella Thanks for the update!

brutella commented 10 months ago

Please checkout version 2.8.0-b6, which enforces an even longer delay.

gson commented 10 months ago

Hi, tried b6 and could not replicate the issue. Thanks for the fix! From my point of view we can close the issue.

sdonati8484 commented 10 months ago

Confirm, completely fixed in b6. Thank you for support!