calimero-project / calimero-core

Core library for KNX network access and management
Other
129 stars 65 forks source link

Telegram is confirmed but not sent immediately #134

Open coteclic opened 1 month ago

coteclic commented 1 month ago

Good morning, I'm using an application developed with calimero 2.5 and I'm currently experiencing a problem that I don't have with my old application developed with calimero 1.4.6.

The KNX/IP routers are Siemens N146 for which here is the discover report: ["IP Router N146", KNX address 3.9.0, KNX medium TP1, Installation 0 - Project 0 (ID 0), KNX IP multicast address 224.0.23.12, MAC address 00:0e:8c:01:25:a5, S/N 0001:1003fba1d, Core (v1), Device Management (v2), Tunneling (v1), Routing (v1), Remote Configuration Diagnosis (v1), Object Server (v1)]

The problem occurs for IP routers that are set to transmit telegrams. Example there are 5 IP routers (R1, R2, R3, R4 and R5). The application simply connects to router R1 and reads and writes to group addresses located on lines of routers 1 to 5.

With the old version 1.4.6, the application sends the read request and when it receives the confirmation, it also receives at the same time the telegram that was retransmitted by the router. On the other hand with version 2.5, when the application receives the confirmation, the telegram that is retransmitted by the router is not received immediately.

Both apps work the same way: once confirmation is received, there is a 50 millisecond pause before a new reading is requested. With version 2.5, the more readings we do in a sequence, the longer it takes to receive the telegram retransmitted by the router and therefore the reading response lengthens, this can go up to 5 seconds or more. It's like this the IP router confirmed but the telegram is not sent immediately so I think it ends up saturating. With version 1.4.6 when a new read request is sent, we are certain that the previous one was sent.

I specify that there are a lot of telegrams sent because there is a lot of material and users. We have several tunneling connections on several IP routers at the same time in the application. Only those that are set in the manner explained above pose a problem with version 2.5. The others for which the telegram setting is "filter" and for which we only request the reading of group addresses found on the line of the router itself, this works.

This the method used with 1.4.6 :

//Connection and listener
CEMI_Connection tunnel = new CEMI_Connection(new InetSocketAddress(eibGateway, eibGatewayPort), new TunnellingConnectionType());
tunnel.addFrameListener(...);

// SEND request
tunnel.sendFrame(Request,CEMI_Connection.WAIT_FOR_CONFIRM);

// Example of logs
23:19:54,912 DEBUG EibNetController:149 - -# 7 Send read request for 5/1/72
23:19:54,929 DEBUG EibNetController:785 - -# 7 Received frame from 5.2.175:5/1/72 - (00 00)
23:19:54,929 DEBUG EibNetController:212 -  # 7 request sent for : 5/1/72 (end of sendFrame with wait for confirm true)
                                           # others logs
                                           # others logs
23:19:55,045 DEBUG EibNetController:785 - -# 7 Received frame from 5.2.45:5/1/72 (00 40) value = 34

This the method used with 2.5 :

//Connection and listener
KNXNetworkLink knl=KNXNetworkLinkIP.newTunnelingLink(local, remote, nat, TPSettings.TP1);
knl.addLinkListener(NetworkLinkListener);

// SEND request
knl.sendRequestWait(groupAddress,Priority.NORMAL,bytes);

// example of logs

waiting time for reception of the telegram after confirmation relatively quick  but not immediate (18:23:39.603 to 18:23:39.627, 24ms)

22 07 2024 18:23:39.585 DEBUG (KnxController.java:161) - Controller KNX 1 : send read request for 7/4/172
22 07 2024 18:23:39.585 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/172 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
                              # others logs
                              # others logs
                              # others logs
22 07 2024 18:23:39.603 DEBUG (KNXnetIPTunnel.java:389) - received request seq 3 (channel 72) cEMI L-Data.con 15.15.255->7/4/172
22 07 2024 18:23:39.603 DEBUG (AbstractLink.java:199) - confirmation of 7/4/172
22 07 2024 18:23:39.603 DEBUG (KnxListener.java:253) - Confirmation ok for 7/4/172
                              # others logs
                              # others logs
                              # others logs
22 07 2024 18:23:39.627 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/172 L_Data.ind, normal priority hop count 4, tpdu 00 00
                              # others logs
                              # others logs
                              # others logs
22 07 2024 18:23:39.742 DEBUG (AbstractLink.java:194) - indication 2.8.2->7/4/172 L_Data.ind, normal priority hop count 4, tpdu 00 40
22 07 2024 18:23:39.742 DEBUG (KnxListener.java:101) - Controller KNX 1 : response for 7/4/172 , value = 0

waiting time for reception of the telegram after confirmation very slow (16:18:06.236 to 16:18:11.131, 5 seconds)

22 07 2024 16:18:06.094 DEBUG (KnxController.java:161) - Controller KNX 1 : send read request for 7/4/109
22 07 2024 16:18:06.094 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/109 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
                            # others logs
                            # others logs
                            # others logs
22 07 2024 16:18:06.235 DEBUG (KNXnetIPTunnel.java:389) - received request seq 74 (channel 69) cEMI L-Data.con 15.15.255->7/4/109
22 07 2024 16:18:06.236 DEBUG (AbstractLink.java:199) - confirmation of 7/4/109
22 07 2024 16:18:06.236 DEBUG (KnxListener.java:253) - Confirmation ok for 7/4/109
                            # others logs
                            # others logs
                            # others logs
22 07 2024 16:18:11.131 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/109 L_Data.ind, normal priority hop count 4, tpdu 00 00
                            # others logs
                            # others logs
                            # others logs
22 07 2024 16:18:11.529 DEBUG (AbstractLink.java:194) - indication 2.6.1->7/4/109 L_Data.ind, normal priority hop count 4, tpdu 00 41
22 07 2024 16:18:11.530 DEBUG (KnxListener.java:101) - Controller KNX 1 : response for 7/4/109 , value = 1

Do you know where the problem comes from ?

Thank you !

bmalinowsky commented 1 month ago

With trace log, you should see the tunneling messages of the KNXnet/IP connection layer (i.e. one layer lower), and compare if the delay for the indication also happens there.

One reason I can think of why this is happening, is that your send logic is called from the notification thread? The notification you receive for a confirmation and indication runs in a dedicated thread. If that thread is not able to return from client code, this will delay the next notification.

coteclic commented 1 month ago

Thank you for your answer.

For the trace logs I will do it a little later because I cannot currently carry out tests on the site due to the holidays.

I don't think there is a difference in delay with KNXnet/IP connection layer and no thread blocking either because during the seconds delay I receive other telegrams on the tunnel including responses to previous read requests .

Furthermore, I do not think that the telegram is sent at the time of confirmation and that there is then a slowness in receiving the indication because during a write request this is not carried out at confirmation but only when the telegram is received with a few seconds delay. We did the test on a lighting, when we try to turn it on by sending an ON, the confirmation arrives quite quickly but the transmission telegram arrives late and the lighting also turns on late so I infer that the telegram is not sent at the time of confirmation.

I looked again at the logs that I have in debug mode of new application with 2.5 and the problem seems to come from what I thought. For the first 3 requests we can see that the retransmitted telegram arrives approximately 20ms after confirmation. From the 4th request (7/4/170), the telegram arrives after 80ms, as the application delays 50ms between requests, a new request (7/4/169) is sent before the arrival of this telegram . This is when the problem arises. We can then see that the reception time of the retransmitted telegram is longer and longer, so more and more requests are sent during this timing and so on… (almost 2 seconds for 7/4/187).

22 07 2024 16:11:10.080 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/80 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.104 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/80
22 07 2024 16:11:10.138 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/80 L_Data.ind, normal priority hop count 4, tpdu 00 00

22 07 2024 16:11:10.156 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/172 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.190 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/172
22 07 2024 16:11:10.212 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/172 L_Data.ind, normal priority hop count 4, tpdu 00 00

22 07 2024 16:11:10.358 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/171 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.378 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/171
22 07 2024 16:11:10.393 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/171 L_Data.ind, normal priority hop count 4, tpdu 00 00

22 07 2024 16:11:10.431 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/170 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.465 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/170

22 07 2024 16:11:10.516 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/169 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00

22 07 2024 16:11:10.543 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/170 L_Data.ind, normal priority hop count 4, tpdu 00 00

22 07 2024 16:11:10.607 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/169

22 07 2024 16:11:10.658 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/168 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.676 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/168

22 07 2024 16:11:10.754 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/187 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00

22 07 2024 16:11:10.836 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/169 L_Data.ind, normal priority hop count 4, tpdu 00 00

22 07 2024 16:11:10.855 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/187

22 07 2024 16:11:10.919 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/186 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:10.936 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/186

22 07 2024 16:11:11.112 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/91 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.148 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/91

22 07 2024 16:11:11.205 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/2 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.264 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/2

22 07 2024 16:11:11.358 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/1 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.376 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/1

22 07 2024 16:11:11.433 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/188 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.523 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/188

22 07 2024 16:11:11.582 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/93 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.600 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/93

22 07 2024 16:11:11.659 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/183 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.676 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/183

22 07 2024 16:11:11.797 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/92 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.862 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/92

22 07 2024 16:11:11.924 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/182 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:11.943 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/182

22 07 2024 16:11:12.015 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->8/1/100 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:12.033 DEBUG (KnxListener.java:253) - Confirmation ok pour 8/1/100

22 07 2024 16:11:12.089 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/95 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:12.196 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/95

22 07 2024 16:11:12.247 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/185 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:12.268 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/185

22 07 2024 16:11:12.318 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/94 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00
22 07 2024 16:11:12.336 DEBUG (KnxListener.java:253) - Confirmation ok pour 7/4/94

22 07 2024 16:11:12.550 DEBUG (KNXNetworkLinkIP.java:422) - send (wait for confirmation) 0.0.0->7/4/184 L_Data.req, normal priority hop count 6 repeat, tpdu 00 00

22 07 2024 16:11:12.550 DEBUG (AbstractLink.java:194) - indication 15.15.255->7/4/187 L_Data.ind, normal priority hop count 4, tpdu 00 00

I could extend the timeout to 150ms instead of 50ms between requests to test if it works but that would be a lot of time wasted knowing that on average the time required is 20ms and that this time is only higher occasionally, and if a telegram takes more than 150ms to arrive the problem will arise again.

I really don't understand why there is confirmation while the telegram is not sent.

I also checked the logs again in the old application with 1.4.6, it also happens that the reception time of the retransmitted telegram is more than 20ms but in this case the confirmation is also delayed and it arrives at the same time.

What do you think about this ?

bmalinowsky commented 1 month ago

You are right, if other notifications are incoming at normal pace, it's not a delay in the local notification thread.

From the log I would also say that the delay is not introduced by the first router, but by the second after that.

One thing I notice (don't know if it is related): did you set a source address with your old application when sending a request, or did you also use 0.0.0? Because the received frames (.con and also .ind) contain 15.15.255, which usually means that there are uninitialized devices, or that the responding device has been reset.

coteclic commented 1 month ago

yes, the old application also uses source address 0.0.0

I also wondered about the address 15.15.255 for routers.

What is surprising is that when we do a discover, we find the physical address (example: 3.9.0, in my first message) but using tunneling the address is always 15.15.255 for every routers. However, this does the same thing in the old application and does not prevent it from functioning.

However, on other projects that we have, the physical address is really that of the router, for example 2.1.0. I think it is either a programming problem with the IP router or perhaps due to the router model…

I saw in the code of version 1.4.6 and the CEMI_Connection.java file the passage above:

if (request != null && request.getChannelid() == this.server.getChannelid() && (request.getSequenceNumber() == this.server.getReceiveSequence() || request.getSequenceNumber() == this.server.getReceiveSequence() - 1)) {
                     CEMI_Connection_Ack ack = new CEMI_Connection_Ack(this.connection.getAck(), request.getChannelid(), request.getSequenceNumber(), (short)0);
                     datapacket = new DatagramPacket(ack.toByteArray(), ack.toByteArray().length, this.server.getControlEndpoint());
                     this.socket.send(datapacket);
                     if (request.getSequenceNumber() == this.server.getReceiveSequence()) {
                        this.server.incReceiveSequence();
                        FrameEvent event;
                        if (request.getCemi().getMessageCode() != 46) {
                           event = new FrameEvent(this, request.getCemi());
                           this.dispatcher.queue(event);
                        } else if (request.getCemi().getMessageCode() == 46 && ((CEMI_L_DATA)request.getCemi()).isPositiveConfirmation()) {
                           if (!this.sendMode) {
                              this.sendState = 0;
                           } else {
                              try {
                                 this.mainLock.lock();
                                 this.wait_for_confirm.signal();
                              } finally {
                                 this.mainLock.unlock();
                              }
                           }

                           event = new FrameEvent(this, request.getCemi());
                           this.dispatcher.queue(event);
                        }
                     }
                  }

I see that there is an "ack" sent but I did not find this part in version 2.5. The code being a little more complex, maybe I didn't find it. Is there this ack in version 2.5?

bmalinowsky commented 1 month ago

The ack is here.

coteclic commented 1 month ago

Thank you.

I redid some tests with the old version, it seems that I manage to reproduce more or less the same problem, again when requests are sent before the telegram is received. I will do tests with the new version by delaying the requests a little more when possible (late August or early September) and I will post the results.