calimero-project / calimero-server

KNXnet/IP server for KNX IP, KNX (RF) USB, FT1.2, and TP-UART
Other
52 stars 18 forks source link

No reconnect when KNX bus is interrupted #10

Closed aschamberger closed 3 years ago

aschamberger commented 3 years ago

With your fixes from #9 the server runs flawlessly with the kBerry. But I noticed when there is a KNX bus interruption the server does not resume the connection but throws the following exceptions:

[knx-server discovery endpoint] DEBUG calimero.server.knx-server - KNXnet/IP discovery: identify as 'Hass OS KNXnet/IP Server' for container /dev/ttyAMA0 to 192.168.178.29:50299 on [any]
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - received reset
[Calimero FT1.2 receiver] WARN calimero.serial.ft12:/dev/ttyAMA0 - toggle frame count bit
[Calimero FT1.2 receiver] WARN calimero.link./dev/ttyAMA0 - received unspecified frame f0 c2 00 0a 00 01 00 0a 01 00
java.lang.NullPointerException
    at tuwien.auto.calimero.link.AbstractLink.dispatchCustomEvent(AbstractLink.java:676)
    at tuwien.auto.calimero.link.AbstractLink$LinkNotifier.frameReceived(AbstractLink.java:162)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.lambda$fireFrameReceived$0(FT12Connection.java:791)
    at tuwien.auto.calimero.internal.EventListeners.fire(EventListeners.java:126)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.fireFrameReceived(FT12Connection.java:791)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.readFrame(FT12Connection.java:709)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.run(FT12Connection.java:627)
[Calimero FT1.2 receiver] WARN calimero.link./dev/ttyAMA0 - received unspecified frame f0 c2 00 0a 00 01 00 0a 01 01
java.lang.NullPointerException
    at tuwien.auto.calimero.link.AbstractLink.dispatchCustomEvent(AbstractLink.java:676)
    at tuwien.auto.calimero.link.AbstractLink$LinkNotifier.frameReceived(AbstractLink.java:162)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.lambda$fireFrameReceived$0(FT12Connection.java:791)
    at tuwien.auto.calimero.internal.EventListeners.fire(EventListeners.java:126)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.fireFrameReceived(FT12Connection.java:791)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.readFrame(FT12Connection.java:709)
    at tuwien.auto.calimero.serial.FT12Connection$Receiver.run(FT12Connection.java:627)
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:51241] TRACE calimero.server.knx-server - received connection state request from /192.168.178.29:51241 for channel 103
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:51241] DEBUG calimero.server.knxnetip.KNX IP LinkLayer 192.168.178.29:51241 - close connection for channel 103 - client request
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:51241] DEBUG calimero.server.gateway.knx-server - removed connection KNX IP LinkLayer 192.168.178.29:51241 (client request)
[/dev/ttyAMA0 data endpoint 192.168.178.29:51241] INFO calimero.server.knx-server - /dev/ttyAMA0 data endpoint 192.168.178.29:51241 closed
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:51241] INFO calimero.server.knx-server - close tcp connection to /192.168.178.29:51241 (no active secure session or client connection)
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:51241] INFO calimero.server.knx-server - close tcp connection to /192.168.178.29:51241
bmalinowsky commented 3 years ago

The problem seems that if the device resets itself (which in general is fine, and nothing needs to be done about it), because it is a baos-capable device, it activates baos mode. The next received "unspecified" frames are baos messages, and useless for link-layer communication.

I will think about a fix. But I cannot unconditionally set the mode, because the server also supports baos.

How often does that happen approx? In between, you can probably workaround that if you open a busmon connection, immediately close it again, then open a link-layer connection again. Because then the server goes through all the setup quirks to make sure the device is configured correctly.

aschamberger commented 3 years ago

Ok, I understand. As I don't care about baos it would be fine to have a config option like: LinkLayerOnDeviceReset=true

It happened most likely only when the bus power was accidentally removed. After this I intentionally removed the bus connection today to simulate the situation. When I restart the server it connects fine again. So it does only happen in rare cases, however it is not very good when after a short power outage the connection is "lost" this way and you don't notice it.

Thanks for looking into it.

bmalinowsky commented 3 years ago

First stab at this. I added a connection status notification, so at startup in the log there should be a trace output à la registered void tuwien.auto.calimero.link.KNXNetworkLinkFT12$1.connectionStatus(tuwien.auto.calimero.serial.ConnectionStatus)

If a reset is observed, it should force your device to link-layer mode again. I cannot try it with a baos device for that exact setup.

Also the NPE and WARN calimero.serial.ft12:/dev/ttyAMA0 - toggle frame count bit should be gone spamming the log.

(As usual, you need to rename .zip -> .jar)

calimero-core-2.5-M1.zip

aschamberger commented 3 years ago

I just tried the updated core jar. Was busy with other stuff the last days. Sadly it doesn't work yet. Here is the log after a restart of the server:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Calimero server settings (required for startup) -->
<knxServer name="knx-server" friendlyName="Hass OS KNXnet/IP Server">
    <!-- KNXnet/IP search & discovery -->
    <discovery listenNetIf="all" outgoingNetIf="all" activate="true" />
    <!-- Provides the KNXnet/IP-side configuration for access to one KNX subnet -->
    <serviceContainer activate="true" routing="true" networkMonitoring="true" 
        udpPort="3671" listenNetIf="eth0">
        <knxAddress type="individual">1.1.0</knxAddress>
        <knxSubnet type="ft12-cemi" medium="tp1" knxAddress="0">/dev/ttyAMA0</knxSubnet>
        <!-- KNX group address filter applied by the server for this service container (optional) -->
        <groupAddressFilter>
        </groupAddressFilter>
        <!-- Additional KNX individual addresses assigned to client KNXnet/IP connections (optional) -->
        <additionalAddresses>
            <knxAddress type="individual">1.1.101</knxAddress>
            <knxAddress type="individual">1.1.102</knxAddress>
            <knxAddress type="individual">1.1.103</knxAddress>
            <knxAddress type="individual">1.1.104</knxAddress>
            <knxAddress type="individual">1.1.105</knxAddress>
        </additionalAddresses>
    </serviceContainer>
    <!-- Add next service container (optional) -->
</knxServer>
[main] TRACE calimero.xml - resolve /etc/server-config.xml
[main] TRACE calimero.xml - no StaX implementation found (java.lang.NoClassDefFoundError: javax/xml/stream/XMLStreamException), using internal XMLStreamReader
[main] TRACE calimero.xml - resolve jar:file:/opt/calimero/calimero-core-2.5-M1.jar!/properties.xml
[main] TRACE calimero.xml - no StaX implementation found (java.lang.NoClassDefFoundError: javax/xml/stream/XMLStreamException), using internal XMLStreamReader
[main] DEBUG calimero.device.knx-server - loading interface object server from file:///knx-server-ios.xml
[main] TRACE calimero.xml - resolve file:///knx-server-ios.xml
[main] INFO calimero.device.knx-server - could not open file:///knx-server-ios.xml, create resource on closing device (/knx-server-ios.xml (No such file or directory))
[main] INFO calimero.server.knx-server - Hass OS KNXnet/IP Server v2.5-M1
[main] INFO calimero.server.knx-server - knx-server 'Hass OS KNXnet/IP Server' - 1 service container, discovery: listen on [all] send on [all]
[main] INFO calimero.server.knx-server - /dev/ttyAMA0:
    listen on eth0 (port 3671), KNX IP routing multicast group 224.0.23.12
    ft12-cemi /dev/ttyAMA0 connection: TP1 medium, device 1.1.0
[main] DEBUG calimero.server.knx-server - setup ft12-cemi subnet '/dev/ttyAMA0'
[knx-server discovery endpoint] TRACE calimero.server.knx-server - network stack uses IPv4 addresses: unknown
[knx-server discovery endpoint] TRACE calimero.server.knx-server - found network interfaces veth3f96c6d, veth9f5e922, veth5fe8d06, veth8bb04fa, veth844f8c3, veth416498f, hassio [172.30.32.1], docker0 [172.17.0.1], eth0 [192.168.178.72], lo [127.0.0.1]
[knx-server discovery endpoint] INFO calimero.server.knx-server - knx-server discovery endpoint [hassio, docker0, eth0, lo] is up and running
[knx-server] TRACE calimero.serial - check Java library path /opt/jdk/lib/server:/opt/jdk/lib:/opt/jdk/../lib:/usr/java/packages/lib:/lib:/usr/lib
[knx-server] INFO calimero.serial - no serialcom in java.library.path: [/opt/jdk/lib/server, /opt/jdk/lib, /opt/jdk/../lib, /usr/java/packages/lib, /lib, /usr/lib]
[knx-server control endpoint /dev/ttyAMA0] TRACE calimero.server.knx-server - /dev/ttyAMA0 control endpoint bound to 192.168.178.72:3671
[knx-server routing service 224.0.23.12] DEBUG calimero.knxnetip.KNX IP routing service 224.0.23.12 - join multicast group 224.0.23.12 on eth0
[knx-server] DEBUG calimero.serial.ft12:/dev/ttyAMA0 - using rxtx library for serial port access
[knx-server routing service 224.0.23.12] INFO calimero.knxnetip.KNX IP routing service 224.0.23.12 - multicast loopback mode enabled
[knx-server routing service 224.0.23.12] INFO calimero.server.knx-server - knx-server /dev/ttyAMA0 KNX IP routing service 224.0.23.12 is up and running
[knx-server] INFO calimero.serial.ft12:/dev/ttyAMA0 - open rxtx (RXTX-2.2pre2) serial port connection for /dev/ttyAMA0
[knx-server control endpoint /dev/ttyAMA0] INFO calimero.server.knx-server - /dev/ttyAMA0 secure mgmt/tunneling connections: optional/optional
[knx-server] DEBUG calimero.serial.ft12:/dev/ttyAMA0 - setup serial port: baudrate 19200, even parity, 8 databits, 1 stopbits, flow control 0
[knx-server] TRACE calimero.serial.ft12:/dev/ttyAMA0 - send reset to BCU
[knx-server /dev/ttyAMA0 tcp service] INFO calimero.server.knx-server - knx-server /dev/ttyAMA0 tcp service (eth0 192.168.178.72:3671) is up and running
[knx-server] TRACE calimero.serial.ft12:/dev/ttyAMA0 - sending FT1.2 frame, blocking, attempt 1
[knx-server] TRACE calimero.serial.ft12:/dev/ttyAMA0 - registered void tuwien.auto.calimero.link.KNXNetworkLinkFT12$1.connectionStatus(tuwien.auto.calimero.serial.ConnectionStatus)
[knx-server] DEBUG calimero.link./dev/ttyAMA0 - hop count set to 6
[knx-server] DEBUG calimero.server.gateway.knx-server - set maximum APDU length of '/dev/ttyAMA0' to 15
[knx-server control endpoint /dev/ttyAMA0] INFO calimero.server.knx-server - knx-server /dev/ttyAMA0 control endpoint (eth0 192.168.178.72:3671) is up and running
[knx-server] WARN calimero.link./dev/ttyAMA0 - unsupported event type void tuwien.auto.calimero.server.gateway.KnxServerGateway$SubnetListener.baosService(tuwien.auto.calimero.baos.BaosService)

Here the log when I unplug+replug the KNX cable (the connection in ETS is also gone after this):

[knx-server] TRACE calimero.server.gateway.knx-server - subnet /dev/ttyAMA0: 1.1.36->2/1/23 L_Data.ind, low priority hop count 6, tpdu 00 80 0c b1: T_Group, A_Group.write
[knx-server] DEBUG calimero.server.gateway.knx-server - dispatch 1.1.36->2/1/23 to all server-side connections
[knx-server] TRACE calimero.server.knxnetip.KNX IP routing service 224.0.23.12 - add to multicast loopback frame buffer: 1.1.36->2/1/23 L_Data.ind, low priority hop count 5, tpdu 00 80 0c b1
[knx-server] TRACE calimero.server.knxnetip.KNX IP routing service 224.0.23.12 - sending cEMI frame seq 0, NonBlocking, attempt 1 (channel 0) 06 10 05 30 00 13 29 00 bc d0 11 24 11 17 03 00 80 0c b1
[knx-server] TRACE calimero.server.knxnetip.KNX IP LinkLayer 192.168.178.29:60004 - sending cEMI frame seq 0, NonBlocking, attempt 1 (channel 1) 06 10 04 20 00 17 04 01 00 00 29 00 bc d0 11 24 11 17 03 00 80 0c b1
[knx-server] TRACE calimero.server.gateway.knx-server - dispatching to KNX subnets: exclude subnet /dev/ttyAMA0
[knx-server routing service 224.0.23.12] TRACE calimero.server.knxnetip.KNX IP routing service 224.0.23.12 - discard multicast loopback cEMI frame: 1.1.36->2/1/23 L_Data.ind, low priority hop count 5, tpdu 00 80 0c b1
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:60004] TRACE calimero.server.knx-server - received connection state request from /192.168.178.29:60004 for channel 1
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - received reset
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - sending FT1.2 frame, blocking, attempt 1
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - sending FT1.2 frame, blocking, attempt 2
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - sending FT1.2 frame, blocking, attempt 3
[Calimero FT1.2 receiver] TRACE calimero.serial.ft12:/dev/ttyAMA0 - sending FT1.2 frame, blocking, attempt 4
[Calimero FT1.2 receiver] INFO calimero.serial.ft12:/dev/ttyAMA0 - close serial port /dev/ttyAMA0 - client request
[Calimero link notifier] DEBUG calimero.server.gateway.knx-server - set maximum APDU length of '/dev/ttyAMA0' to 15
[Calimero FT1.2 receiver] DEBUG calimero.link./dev/ttyAMA0 - link closed
[Calimero link notifier] DEBUG calimero.server.knxnetip.KNX IP LinkLayer 192.168.178.29:60004 - send tunneling-feat.info connection status 00
[Calimero link notifier] INFO calimero.server.gateway.knx-server - KNX subnet link closed (client request)
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:60004] TRACE calimero.server.knx-server - received connection state request from /192.168.178.29:60004 for channel 1
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:60004] DEBUG calimero.server.knxnetip.KNX IP LinkLayer 192.168.178.29:60004 - close connection for channel 1 - client request
[/dev/ttyAMA0 data endpoint 192.168.178.29:60004] INFO calimero.server.knx-server - /dev/ttyAMA0 data endpoint 192.168.178.29:60004 closed
[knx-server /dev/ttyAMA0 tcp connection /192.168.178.29:60004] DEBUG calimero.server.gateway.knx-server - removed connection KNX IP LinkLayer 192.168.178.29:60004 (client request)
bmalinowsky commented 3 years ago

I see that the immediate LL mode switch introduced a stalling of the receiver (the receiver thread should not send, duh). So I put the mode switching into its own thread now.

calimero-core-2.5-M1.zip

aschamberger commented 3 years ago

Wow. That was fast :) Seems to work now :D