eclipse / paho.mqtt.rust

paho.mqtt.rust
Other
511 stars 102 forks source link

Memory corruptions when the host doesn't resolve to an IP #190

Closed JosuGZ closed 9 months ago

JosuGZ commented 1 year ago

When I try to connect to a particular hostname that doesn't resolve to an IP address, like "tcp://www.testtesttesttestanothertest.com:18986" I get memory corruption errors (double frees etc).

Here is a log before the crash:

20230220 180232.094 Automatically attempting to reconnect
20230220 180232.094 Connecting to serverURI www.testtesttesttestanothertest.com:18986 with MQTT version 4
20230220 180232.094 getaddrinfo failed for addr www.testtesttesttestanothertest.com with rc -2
20230220 180232.094 www.testtesttesttestanothertest.com is not a valid IP address
20230220 180232.094 Connect failed, more to try
20230220 180232.094 Connecting to serverURI www.testtesttesttestanothertest.com:18986 with MQTT version 3
20230220 180232.094 getaddrinfo failed for addr www.testtesttesttestanothertest.com with rc -2
20230220 180232.094 www.testtesttesttestanothertest.com is not a valid IP address
20230220 180232.094 Calling command failure for client 
[2023-02-20T17:02:32Z DEBUG paho_mqtt::token] Token failure! 0x55932e527cd0, 0x7ff0f3856bf0
[2023-02-20T17:02:32Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: -1
free(): double free detected in tcache 2
Aborted (core dumped)

This happens both on 0.10 and 0.12.

fpagliughi commented 1 year ago

Wow. Crazy. The underlying C library is calling the failure function twice; once when attempting to connect w/ MQTT v3.1.1 and then again making an attempt with v 3.1.

I hadn't noticed that behavior before in all this time. I assume that's normal for it when using the MQTTVERSION_DEFAULT, which the Rust library is doing internally.

OK. That sounds like a real bug. I'll take a look.

Thanks.

stintel commented 1 year ago

I'm also hitting free(): double free detected in tcache 2 with 0.12 after the set_connection_lost_callback, when connecting to a non-SSL broker. When connecting to an SSL broker, I'm getting a segfault. Neither of these crashes occur with 0.11.

[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token failure! 0x7f18ac000fc0, 0x7f18cb1fabc0
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token failure message: "socket error"
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: -1
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Expecting server response for: Connect
[2023-03-04T23:36:43Z DEBUG paho_mqtt::token] Got response: ServerResponse { rsp: None, props: Properties { cprops: MQTTProperties { count: 0, max_count: 0, length: 0, array: 0x0 } }, reason_code: Success }
[2023-03-04T23:36:44Z DEBUG paho_mqtt::token] Token success! 0x7f18ac000fc0, 0x7f18cb1fad10
[2023-03-04T23:36:44Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: 0
free(): double free detected in tcache 2
Aborted (core dumped)
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token failure! 0x7fbac00bac40, 0x7fbb66df8bc0
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token failure message: "socket error"
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: -1
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Expecting server response for: Connect
[2023-03-04T20:29:25Z DEBUG paho_mqtt::token] Got response: ServerResponse { rsp: None, props: Properties { cprops: MQTTProperties { count: 0, max_count: 0, length: 0, array: 0x0 } }, reason_code: Success }
[2023-03-04T20:29:26Z DEBUG paho_mqtt::token] Token success! 0x7fbac00bac40, 0x7fbb66df8d10
[2023-03-04T20:29:26Z DEBUG paho_mqtt::token] Token w ID 0 completed with code: 0
Segmentation fault (core dumped)

Let me know if you want me to create separate bugs for this.

fpagliughi commented 1 year ago

OK. The token completion function is essentially FnOnce for each token, but it clearly looks like it is being called twice.There was a bug in the C lib long ago where the completion callback would get fired off twice, causing this exact same type of error. I'll look to see if that creeped back in to the C lib, or if I'm tying the same token to two events. Or, perhaps the C lib changed behavior and I just need to handle it differently.

fpagliughi commented 1 year ago

I've been trying, but I can't seem to recreate either of these errors.

Also, I originally thought that these were variations of the same bug, but looking closer, the first post (@JosuGZ 's) I see that the C lib is not firing the callback twice. It's trying the connection twice (once for v3.1.1, then again for v3.1). That seems silly to me; the server is not refusing the protocol... it's just not there. Why try again? But only after the 2nd attempt does it fire the failure callback.

On the second post (@stintel 's), it does appear that the library is firing the callback twice for the same token, first with a failure, then with a success. The first address in the Token failure/success message is that of the token, like 0x7f18ac000fc0 in the first example.

So... assuming these are separate problems, let's track them in different issues. @stintel , can you create a new Issue for your report, and copy/past your message into that new ticket? Then please add some more info about how you're hitting the bug (example app? your own app? etc), and what platform you're using, etc.

fpagliughi commented 1 year ago

@JosuGZ , I tried a couple of the example programs (using the code in the develop branch) with the fake URI you suggested, and I get something like this from the C logs:

19691231 190000.000 Connecting to serverURI www.testtesttesttestanothertest.com:18986 with MQTT version 4
20230309 204619.903 getaddrinfo failed for addr www.testtesttesttestanothertest.com with rc -2
20230309 204619.903 www.testtesttesttestanothertest.com is not a valid IP address
20230309 204619.903 Connect failed, more to try
20230309 204619.903 Connecting to serverURI www.testtesttesttestanothertest.com:18986 with MQTT version 3
20230309 204619.903 getaddrinfo failed for addr www.testtesttesttestanothertest.com with rc -2
20230309 204619.903 www.testtesttesttestanothertest.com is not a valid IP address
20230309 204619.903 Calling command failure for client rust_dyn_subscribe
Connection attempt failed with error code -1.

No crash. It works as expected.

But, what I'm not seeing is the first line in your output:

20230220 180232.094 Automatically attempting to reconnect

How, exactly, is it trying to reconnect to a fake URI? How was it connected to begin with? Can you explain a little more how to recreate this.

JosuGZ commented 1 year ago

These are my connect options:

  #[cfg(feature = "ssl")]
  fn get_connect_options(&self) -> ConnectOptions {
    let mut connect_options_builder = ConnectOptionsBuilder::new();

    connect_options_builder.ssl_options(self.get_ssl_options());

    let interval = Duration::new(1,0);
    connect_options_builder.automatic_reconnect(interval, interval);

    connect_options_builder.finalize()
  }

Using the debugger I was unable to trigger the error btw.

fpagliughi commented 1 year ago

OK. I'll try that.

fpagliughi commented 9 months ago

It looks like this is fixed with the upcoming v0.12.3 release. At least I can't recreate it in testing. If you still see it with that version, please feel free to reopen.