XKNX / xknx

XKNX - A KNX library written in Python
http://xknx.io/
MIT License
277 stars 99 forks source link

OverflowErrors on secure timer #1411

Open hvraven opened 4 months ago

hvraven commented 4 months ago

Description of problem: I am trying to set up Home-Assistant to KNX using Routing with Data Secure and IP Secure using an MDT Router. I run into troubles which I tried to debug a bit and appear to be also related to the router. When sending data to the bus I get OverflowErrors when converting the timestamp. This is true for every telegram, below the debug output of one when I trigger a switch.

The source of the problem appears to be the secure timer value. These show up regularly from the router (which I guess is expected), however the timer_value is already at the maximum a 6 byte unsigned int can hold (debug output below). This also never changes. I have no clue how the timer value is generated, but this appears wrong.

If there's a way xknx can influence the timer value it would be great. Either way xknx should handle an overflow on the timer value more gracefully / a better error message).

Version information:

KNX installation:

Problem-relevant configuration.yaml entries (fill out even if it seems unimportant):

Diagnostic data of the config entry (only when Home Assistant is used)

Traceback (if applicable): Overflow:

2024-02-19 08:11:38.214 DEBUG (MainThread) [xknx.telegram] <Telegram direction="Outgoing" source_address="0.0.240" destination_address="4/2/0" payload="<GroupValueWrite value="<DPTBinary value="True" />" />" />
2024-02-19 08:11:38.215 DEBUG (MainThread) [xknx.cemi] Outgoing CEMI: <CEMIFrame code="L_DATA_REQ" info="CEMIInfo("")" data="CEMILData(src_addr="IndividualAddress("0.0.240")" dst_addr="GroupAddress("4/2/0")" flags="1011110011100000" tpci="TDataGroup()" payload="<GroupValueWrite value="<DPTBinary value="True" />" />")" />
2024-02-19 08:11:38.216 DEBUG (KNX Interface) [xknx.knx] Encrypting frame: <KNXIPFrame <KNXIPHeader HeaderLength="6" ProtocolVersion="16" KNXIPServiceType="ROUTING_INDICATION" TotalLength="17" /> body="<RoutingIndication cemi="2900bce000f02200010081" />" />
2024-02-19 08:11:38.218 ERROR (MainThread) [xknx.log] Unexpected error while processing outgoing telegram <Telegram direction="Outgoing" source_address="0.0.240" destination_address="4/2/0" payload="<GroupValueWrite value="<DPTBinary value="True" />" />" />
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/xknx/core/telegram_queue.py", line 169, in _outgoing_rate_limiter
    await self.process_telegram_outgoing(telegram)
  File "/usr/local/lib/python3.12/site-packages/xknx/core/telegram_queue.py", line 205, in process_telegram_outgoing
    await self.xknx.cemi_handler.send_telegram(telegram)
  File "/usr/local/lib/python3.12/site-packages/xknx/cemi/cemi_handler.py", line 74, in send_telegram
    await self.xknx.knxip_interface.send_cemi(cemi)
  File "/usr/local/lib/python3.12/site-packages/xknx/io/knxip_interface.py", line 560, in send_cemi
    return await self._await_from_connection_thread(self._interface.send_cemi(cemi))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/xknx/io/knxip_interface.py", line 533, in _await_from_connection_thread
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/xknx/io/routing.py", line 226, in send_cemi
    self._send_knxipframe(KNXIPFrame.init_from_body(routing_indication))
  File "/usr/local/lib/python3.12/site-packages/xknx/io/routing.py", line 233, in _send_knxipframe
    self.transport.send(knxipframe)
  File "/usr/local/lib/python3.12/site-packages/xknx/io/ip_secure.py", line 468, in send
    knxipframe = self.encrypt_frame(plain_frame=knxipframe)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/xknx/io/ip_secure.py", line 114, in encrypt_frame
    sequence_information = self.get_sequence_information()
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/xknx/io/ip_secure.py", line 475, in get_sequence_information
    return self.secure_timer.get_for_outgoing_secure_wrapper().to_bytes(6, "big")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: int too big to convert

Timestamp received:

2024-02-19 08:11:42.516 DEBUG (KNX Interface) [xknx.raw_socket] Received from ('192.168.0.58', 3671): 061009550024ffffffffffff008381401f8ab08570472a83eb3e2f471ee362803b1f3371
2024-02-19 08:11:42.516 DEBUG (KNX Interface) [xknx.knx] Received from 192.168.0.58:3671: <KNXIPFrame <KNXIPHeader HeaderLength="6" ProtocolVersion="16" KNXIPServiceType="TIMER_NOTIFY" TotalLength="36" /> body="<TimerNotify timer_value="281474976710655" serial_number="008381401f8a" message_tag="b085" message_authentication_code="70472a83eb3e2f471ee362803b1f3371" />" />
farmio commented 4 months ago

Hi 👋!

Did this secure routing setup work before? Is (or was in the past) there any other software / device using secure routing in your installation or is it only one router and xknx? Are all devices on latest version / firmware?

If there's a way xknx can influence the timer value it would be great.

There is no way by design / specification. The timer can only increase, never decrease. When the 6byte value is reached, devices are not intended to do any more communication. IIRC this is not perfectly specified.

From the specs:

The timer shall be reset to zero whenever the Secure Backbone Key is changed.

That way you can reset your routers value. xknx currently doesn't hold its own value, but requests the current timer value from other devices when connection was established.

Either way xknx should handle an overflow on the timer value more gracefully / a better error message).

I agree, but I figured since the specification says

With timer ticks every millisecond, an overflow of the timer would theoretically occur after 9 thousand years.

I figured can implement that a little later 🤣

Unfortunately it's hard to say what triggered this high timer value retrospectively. I'd suggest to reset the router and monitor the multicast communication with wireshark / tcpdump from time to time to see if the timer increases unexpectedly.

hvraven commented 4 months ago

Hi 👋!

Hi, thanks for the info.

Did this secure routing setup work before? Is (or was in the past) there any other software / device using secure routing in your installation or is it only one router and xknx? Are all devices on latest version / firmware?

No, not yet. It is currently the only router in the network, everything else is using secure tunneling. However I failed to decrypt the data secure packages with home assistant when using tunneling. As the documentation is not really clear on this regard I thought/hoped that routing would fix that.

If there's a way xknx can influence the timer value it would be great.

There is no way by design / specification. The timer can only increase, never decrease. When the 6byte value is reached, devices are not intended to do any more communication. IIRC this is not perfectly specified.

This makes sense, I guess the timer is intended to prevent replay attacks and similar.

From the specs:

The timer shall be reset to zero whenever the Secure Backbone Key is changed.

That way you can reset your routers value. xknx currently doesn't hold its own value, but requests the current timer value from other devices when connection was established.

Is there a method to trigger that exchange from xknx? Or would I require a different router?

Either way xknx should handle an overflow on the timer value more gracefully / a better error message).

I agree, but I figured since the specification says

With timer ticks every millisecond, an overflow of the timer would theoretically occur after 9 thousand years.

I figured can implement that a little later 🤣

I completely understand :-D Not sure what happened in my case, but normally this should not be a problem.

Unfortunately it's hard to say what triggered this high timer value retrospectively. I'd suggest to reset the router and monitor the multicast communication with wireshark / tcpdump from time to time to see if the timer increases unexpectedly.

So far I was not able to get the counter back down. I power cycled the system, that apparently was not enough. I'll try if I can trigger a reset some other way.

hvraven commented 4 months ago

With that information I managed to get it working. I could trigger a reset by temporarily disabling IP Secure on the backbone in ETS. When I re-enabled it, it generated new keys, resetting the counter. After updating HA with the new keys routing now works as expected (including data secure decryption). Now the counter is at 75866727, so plenty of room to grow.

Thank you very much for your very helpful explanation. I guess it is still a bug, but practically not that relevant, so feel free to close.

farmio commented 4 months ago

Is there a method to trigger that exchange from xknx? Or would I require a different router?

Yes, that's standard secure routing behaviour. It works with the TimerNotify frames. We just send one with timer value 0 and wait for a device to respond with a correct value - and when we received that (or a timeout occurs - which means we are the only device in the secure routing group) we start to send routing indications.

However I failed to decrypt the data secure packages with home assistant when using tunneling.

You'd need to assign the GAs to the tunnelling endpoint you use and ex-import the knxkey file again then. However, afaik MDTs application has always had problems with that procedure and I don't know if that is fixed meanwhile. But your issue doesn't really strengthen my confidence in their products 🙃

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Please make sure to update to the latest version of xknx (or Home Assistant) and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.