eclipse / paho.mqtt.rust

paho.mqtt.rust
Other
511 stars 102 forks source link

Memory corruption after reconnect_with_callbacks #197

Closed stintel closed 9 months ago

stintel commented 1 year ago

I'm hitting free(): double free detected in tcache 2 with 0.12 after the set_connection_lost_callback, when reconnecting to a non-SSL broker. When reconnecting to an SSL broker, I'm getting a segfault. Neither of these crashes occur with 0.11.

[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token failure! 0x7f18ac000fc0, 0x7f18cb1fabc0
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token failure message: "socket error"
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: -1
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Expecting server response for: Connect
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Got response: ServerResponse { rsp: None, props: Properties { cprops: MQTTProperties { count: 0, max_count: 0, length: 0, array: 0x0 } }, reason_code: Success }
[2023-03-04T23:36:44Z DEBUG paho_mqtt::token] Token success! 0x7f18ac000fc0, 0x7f18cb1fad10
[2023-03-04T23:36:44Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: 0
free(): double free detected in tcache 2
Aborted (core dumped)
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token failure! 0x7fbac00bac40, 0x7fbb66df8bc0
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token failure message: "socket error"
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: -1
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Expecting server response for: Connect
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Got response: ServerResponse { rsp: None, props: Properties { cprops: MQTTProperties { count: 0, max_count: 0, length: 0, array: 0x0 } }, reason_code: Success }
[2023-03-04T20:29:26Z DEBUG paho_mqtt::token] Token success! 0x7fbac00bac40, 0x7fbb66df8d10
[2023-03-04T20:29:26Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: 0
Segmentation fault (core dumped)

This happens with https://codeberg.org/stintel/auditd-forwarder on Gentoo, both using glibc and musl. I have downgraded paho-mqtt to 0.11 as described in https://codeberg.org/stintel/auditd-forwarder/commit/39d269646107a5b5ed326fcf46809f43e4f20268. To reproduce it, changing the dependency to 0.12.0, connecting to an MQTT broker, then restarting that MQTT broker should trigger the issue. It's hardcoded to use SSL but you could change that in https://codeberg.org/stintel/auditd-forwarder/src/branch/main/src/main.rs#L223. Example config for the app: https://codeberg.org/stintel/auditd-forwarder/src/branch/main/src/config.rs#L71-L77 - this should go in /etc/auditd-forwarder.toml. You don't need to be running auditd, you can just nc -lkU /tmp/unix.sock and use that path in the config.

fpagliughi commented 1 year ago

Ah, OK. That's something I can track down. Perhaps I will make a small example that uses the callbacks in a similar manner, and see if that recreates the problem. If not, it would still be a good example!

fpagliughi commented 1 year ago

Quick update... I was able to recreate the crash. I started with the _dynsubscribe.rs example, then trimmed it down to the basics of using callbacks for reconnecting and sure enough, upon successful reconnect, it segfaults. I'll see if I can get it fixed and hopefully released this week.

fpagliughi commented 1 year ago

Just a quick update... It seems like the C library is being thrown off by the call to re-connect coming from within the connection-lost handler.

From what I can tell, the C lib fires the "lost" callback early in the process of unwinding its connection status. By trying to make a new connection at that point seems to mess up its internal client data structure so that it erroneously calls an onFailure callback for the old connection using the Token for the new connection attempt, thus consuming the new token. When it later fails the new connection attempt, it fires the failure callback a second time using the address of the token that was just consumed, but now that address is now invalid. Which will result in either a segfault (as the token is accessed) or double free (when it is free'd a second time).

I assume this is something that the C lib should be allowing us to do - calling connect() from within a connection-lost callback. So for now, I'm assuming it's a bug and opened an issue in that repo. https://github.com/eclipse/paho.mqtt.c/issues/1341

fpagliughi commented 1 year ago

I'm waiting on Paho C, so will need to put this off to the next release, which I will put out as soon as that library is updated, or I can figure a workaround.

stintel commented 1 year ago

I'm waiting on Paho C, so will need to put this off to the next release, which I will put out as soon as that library is updated, or I can figure a workaround.

Understood. Meanwhile I can use v0.11 so it's not urgent for me.

EvilWatermelon commented 1 year ago

If I rollback to v0.11 the subscriber gives me ERROR [-16] Wrong MQTT version

fpagliughi commented 9 months ago

It looks like this is fixed with the upcoming v0.12.3 release. At least I can't recreate it in testing. If you still see it with that version, please feel free to reopen.