commschamp / cc.mqttsn.libs

CommsChampion Ecosystem MQTT-SN client / gateway libraries and applications
Mozilla Public License 2.0
63 stars 16 forks source link

Encapsulated package breaks the gateway #17

Closed trivialkettle closed 1 week ago

trivialkettle commented 2 weeks ago

Hi, when I send the following encapsulated package to the gateway

09 fe 00 d4 3e 27 6e 03  | ....>'n.
c3 1e 04 04 01 00 0a 64  | .......d
6f 6e 67 6c 65 2d 44 34  | ongle-D4
3a 33 45 3a 32 37 3a 36  | :3E:27:6
45 3a 30 33 3a 43 33     | E:03:C3

the gateway crashes with the message

$ cc_mqttsn_gateway_app -c ./gateway.conf
[INFO]: New UDP client connection from: 127.0.0.1:45576
cc_mqttsn_gateway_app: /xxx/cc.mqttsn.libs/gateway/app/gateway/GatewaySession.cpp:58: bool cc_mqttsn_gateway_app::GatewaySession::start(): Assertion `m_termReqCb' failed.

This is the first package my forwarder sends, so it did not connect or anything before.

Thanks for you help.

arobenko commented 2 weeks ago

Please retry with "develop" branch. At this stage the gateway application creates a connection to a broker for every Session object. When the packet arrives and not parsed yet, it is considered to be a legit client and a Session object is created for it, which initiates connection to a broker right away. When the message is parsed new "encapsulated" Session object is created and also initiates a connection to the broker. Due to the fact the the first Session object doesn't really send any data to a broker, the latter closes the connection, then the gateway application re-opens it again a bit later. As the result there is a constant loop of creating and closing one redundant connection to a broker.

I will work on making the connection to a broker being a "lazy" one, i.e. established only when there is a data to be sent to a broker.

trivialkettle commented 2 weeks ago

Hi, thanks for the fast solution. Now the forwarded client can connect to the broker.

Though another problem appeared.

After connecting I let the client idle and the gateway crashed with SIGSEGV

stdout:

[INFO]: New UDP client connection from: 127.0.0.1:56212
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:56212
Signal: SIGSEGV (Segmentation fault)

stack trace:

operator()<…> GatewaySession.cpp:343
operator() GatewaySession.cpp:339
main main.cpp:49

Mosquitto Broker Log:

broker-1  | 1724833057: New connection from 192.168.32.1:42778 on port 1883.
broker-1  | 1724833057: New connection from 192.168.32.1:42776 on port 1883.
broker-1  | 1724833057: New client connected from 192.168.32.1:42776 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724833058: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724833058: New connection from 192.168.32.1:42792 on port 1883.
broker-1  | 1724833058: New client connected from 192.168.32.1:42792 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724833151: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724833151: Client dongle-D4:3E:27:6E:03:C3 closed its connection.

It looks like, the dongle (forwarded node) connects twice but I guess only one of them gets the PINGREQ/PINGRESP. In the log of my dongle I see the PINGREQ/PINGRESP packages.

Thanks a lot for this feature, I'm more than happy to help you resolving the remaining issues.

arobenko commented 2 weeks ago

Thanks, I appreciate your help. My ability to properly test the forwarding encapsulation support is limited. I kind of reluctant and don't have much time to implement proper forwarding application to be able to reproduce problems in my environment.

I've added some extra assert statements to the GatewaySession.cpp (on develop branch) which might help detect the problem. I won't have much time tomorrow, but on Friday I think I'll introduce the lazy broker connection I mentioned. After that it should be easier to get to the bottom of it.

trivialkettle commented 2 weeks ago

One thing that differs between "normal" node and forwarded node is the line in the gateway log

[ERROR]: TCP/IP read error: End of file

With the new changes I get the following logs:

cc_mqttsn_gateway_app -c ./gateway.conf
[INFO]: New UDP client connection from: 127.0.0.1:60551
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:60551
cc_mqttsn_gateway_app: /xxx/cc.mqttsn.libs/gateway/app/gateway/GatewaySession.cpp:346: cc_mqttsn_gateway_app::GatewaySession::startSession()::<lambda(cc_mqttsn_gateway::Session*)>::<lambda()>::<lambda(auto:5&)> [with auto:5 = std::unique_ptr<cc_mqttsn_gateway_app::GatewaySession>]: Assertion `sPtr' failed.

Process finished with exit code 134 (interrupted by signal 6:SIGABRT)

And mosquitto log

broker-1  | 1724840930: New connection from 192.168.32.1:58270 on port 1883.
broker-1  | 1724840930: New connection from 192.168.32.1:58256 on port 1883.
broker-1  | 1724840930: New client connected from 192.168.32.1:58256 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724840930: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724840930: New connection from 192.168.32.1:41516 on port 1883.
broker-1  | 1724840930: New client connected from 192.168.32.1:41516 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724841023: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724841023: Client dongle-D4:3E:27:6E:03:C3 closed its connection.

So there is a client from port 58270, than the forwarded node connects from port 58256 and immediately disconnects and than connects again from port 41516. As we can see the dongle does not exceeds its timeout but the unknown client. I guess the one from port 58270.

arobenko commented 2 weeks ago

I think I know what the problem is, please try updated "develop" branch again.

trivialkettle commented 2 weeks ago

Did not solve it for me, my device does not get any PING/RESP/PINGREQ after the session is terminated for the dongle

Gateway log:

[INFO]: New UDP client connection from: 127.0.0.1:43269
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:43269
[INFO]: New UDP client connection from: 127.0.0.1:43269
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: 
[INFO]: Terminated UDP client connection from: 127.0.0.1:43269

mosquitto log:

broker-1  | 1724843537: New connection from 192.168.32.1:48172 on port 1883.
broker-1  | 1724843537: New connection from 192.168.32.1:48156 on port 1883.
broker-1  | 1724843537: New client connected from 192.168.32.1:48156 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724843538: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724843538: New connection from 192.168.32.1:48180 on port 1883.
broker-1  | 1724843538: New client connected from 192.168.32.1:48180 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724843633: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724843633: Client dongle-D4:3E:27:6E:03:C3 closed its connection.
broker-1  | 1724843638: New connection from 192.168.32.1:52264 on port 1883.
broker-1  | 1724843638: New connection from 192.168.32.1:52280 on port 1883.
broker-1  | 1724843729: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724843729: Client <unknown> has exceeded timeout, disconnecting.
arobenko commented 2 weeks ago

I was talking about the crash / assert. I'll continue to work on it on Friday

On Wed, 28 Aug 2024, 21:24 trivialkettle, @.***> wrote:

Did not solve it for me, my device does not get any PING/RESP/PINGREQ after the session is terminated for the dongle

Gateway log:

[INFO]: New UDP client connection from: 127.0.0.1:43269 [INFO]: Connected client: dongle-D4:3E:27:6E:03:C3 [ERROR]: TCP/IP read error: End of file [ERROR]: TCP/IP read error: End of file [INFO]: Terminating session for client: [INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3 [INFO]: Terminated UDP client connection from: 127.0.0.1:43269 [INFO]: New UDP client connection from: 127.0.0.1:43269 [ERROR]: TCP/IP read error: End of file [INFO]: Terminating session for client: [INFO]: Terminating session for client: [INFO]: Terminated UDP client connection from: 127.0.0.1:43269

mosquitto log:

broker-1 | 1724843537: New connection from 192.168.32.1:48172 on port 1883. broker-1 | 1724843537: New connection from 192.168.32.1:48156 on port 1883. broker-1 | 1724843537: New client connected from 192.168.32.1:48156 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10). broker-1 | 1724843538: Client dongle-D4:3E:27:6E:03:C3 disconnected. broker-1 | 1724843538: New connection from 192.168.32.1:48180 on port 1883. broker-1 | 1724843538: New client connected from 192.168.32.1:48180 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10). broker-1 | 1724843633: Client has exceeded timeout, disconnecting. broker-1 | 1724843633: Client dongle-D4:3E:27:6E:03:C3 closed its connection. broker-1 | 1724843638: New connection from 192.168.32.1:52264 on port 1883. broker-1 | 1724843638: New connection from 192.168.32.1:52280 on port 1883. broker-1 | 1724843729: Client has exceeded timeout, disconnecting. broker-1 | 1724843729: Client has exceeded timeout, disconnecting.

— Reply to this email directly, view it on GitHub https://github.com/commschamp/cc.mqttsn.libs/issues/17#issuecomment-2315049502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASJKGQYAU753N2D3LYZ37TZTWXNJAVCNFSM6AAAAABNGCMMP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJVGA2DSNJQGI . You are receiving this because you commented.Message ID: <commschamp/cc. @.***>

arobenko commented 2 weeks ago

Introduced the following changes on "develop" branch:

trivialkettle commented 2 weeks ago

Hi, it seems to work so far. Thank you very much :)

But one issue still exists: The forwarded devices get disconnected from the broker after the first connection with a TCP/IP Error in gateway log and reconnect. However, I don't see this disconnect in the device log, my device connects and stays connected.

Gateway log:

[INFO]: New UDP client connection from: 127.0.0.1:34193
[INFO]: (0x5cfc4b3c8af0) New primary session
[INFO]: (0x5cfc4b3cad70) New enapsulated session
[INFO]: (0x5cfc4b3cad70) Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[INFO]: (0x5cfc4b3cbd30) New enapsulated session
[INFO]: (0x5cfc4b3cbd30) Connected client: client-82:C5:FA
[ERROR]: TCP/IP read error: End of file

and mosquitto log:

broker-1  | 1725001667: New connection from 192.168.32.1:36628 on port 1883.
broker-1  | 1725001667: New client connected from 192.168.32.1:36628 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1725001667: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1725001667: New connection from 192.168.32.1:36630 on port 1883.
broker-1  | 1725001667: New client connected from 192.168.32.1:36630 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1725002024: New connection from 192.168.32.1:56598 on port 1883.
broker-1  | 1725002024: New client connected from 192.168.32.1:56598 as client-82:C5:FA (p2, c1, k10).
broker-1  | 1725002051: Client client-82:C5:FA disconnected.
broker-1  | 1725002051: New connection from 192.168.32.1:47428 on port 1883.
broker-1  | 1725002051: New client connected from 192.168.32.1:47428 as client-82:C5:FA (p2, c0, k10).

I let it run for some hours to see if anything breaks.

arobenko commented 2 weeks ago

The "End of file" TCP/IP error usually means that the remote end (broker) closes its TCP/IP socket (connection). When such closure happens the gateway detects it and automatically re-connects using the recorded credentials picked up by the CONNECT / CONNACK messages exchange without notifying the client. That's why your client is not aware of it. I thought about implementing gateway logic to send DISCONNECT message to the client and then let the client to preform re-connection logic. However, it can create corner case races where client sends DISCONNECT message to enter sleep mode and if right after receiving such DISCONNECT message from the gateway, the client will perceive it as a sleep mode confirmation, while for gateway the client will be disconnected, not asleep. That's why I decided against it.

There is a need to perform a thorough analysis of why the broker disconnects the client. The first step should be analyzing the MQTT v3.1.1 messages exchanged between the gateway and the broker. I guess you can use wireshark for that purpose, or you can use CommsChampion Tools for that purpose. To do so you can build the all the projects of CommsChampion Ecosystem in one go using cc.cmake project.

cd /path/to/cc.cmake
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=install -DCMAKE_PROJECT_INCLUDE=config/All.cmake
cmake --build . --parallel 16

After the build is complete use ./install/bin/cc_view.sh to run the CommsChampion Tools (see wiki for instructions of how to use). Select "TCP/IP Proxy Socket" as your socket plugin, configure it to connect to remote port 1883 while selecting other port for incoming connections. Select "MQTT v3.1.1 Protocol" as your protocol plugin. After the connection is started, configure your gateway application to connect to the proxy port you chose when configuring socket plugin of the CommsChampion Tools. Now you should be able to see the MQTT messages exchanged between the gateway and the broker with nice visual breakdown of the message fields.

Please let me know if you see any discrepancies between what you expect in those messages and when is actually been sent. Is "keep alive" configuration as expected?

arobenko commented 1 week ago

I've added an ability to emulate encapsulated session to the pub / sub client applications.

Everything seems to be working as expected, no unexpected termination from the broker. Also note, that previously I hard-coded the "radius" value in the forwarding encapsulation header to be "3". Maybe your forwarder didn't like it and did some unexpected stuff with it. Now I removed it and it's value became "0". Please retry again. If the problem persists you need to run a thorough traffic analysis on your setup and see if there are any discrepancies.

arobenko commented 1 week ago

I'm closing this ticket. If after a thorough traffic analysis you come to a conclusion that the gateway doesn't behave as expected, please open a new ticket.

trivialkettle commented 1 week ago

Hi, sorry for the delay, I got another important task. I guess I can work on this next week again.

trivialkettle commented 1 week ago

Hi, I think I found it in the wireshark log.

After connect I use mqttsn_bare_metal_client_will_update to update the will message. This results in a Disconnect Req from the gateway to the broker. Then happens some TCP magic and after that, the gateway talks over a different port to the broker. This whole process is transparent to the client, as you may know, because you wrote the mqttsn client :)

arobenko commented 1 week ago

That makes sense. The MQTT v3.1.1 doesn't have an ability to change a will on the fly, requiring disconnection and re-connection from the gateway. However, the client should set its will through the connection procedure, rather than updating it after the connection is established. It will result in a single connection to the broker without reconnection later.

trivialkettle commented 1 week ago

Due to some system design decisions, I need to subscribe some topics first, to know what my Will_Topic will be, so I need to update the will later.