eclipse / paho.mqtt.cpp

Other
1k stars 433 forks source link

Automatic Reconnect Crashes MQTT when `client.reconnect()` is called #465

Open lizziemac opened 9 months ago

lizziemac commented 9 months ago

Issue Description

When automatic reconnection is enabled in the MQTT client, manually calling client.reconnect() after a disconnect causes the application to crash. This issue occurs specifically when the client doesn't automatically reconnect after a manual disconnect. Disabling automatic reconnection prevents this crash.

Steps to Reproduce

  1. Enable automatic reconnection in the MQTT client.
    auto connOpts = mqtt::connect_options_builder()
                    .clean_session(false)
                    .automatic_reconnect(seconds(2), seconds(30))
                    .finalize();
  2. Manually disconnect the client.
    cli->disconnect()->wait();
  3. Attempt to manually reconnect using client.reconnect().
    cli->reconnect()->wait();

Expected Behavior

The client should successfully reconnect without causing the application to crash, regardless of whether automatic reconnection is enabled or not.

Actual Behavior

The application crashes when client.reconnect() is called with automatic reconnection enabled. Backtrace from a modified multithr_pub_sub.cpp:

0x0000007ff7b1b7e8 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x0000007ff7b1b7e8 in raise () from /lib/libc.so.6
#1  0x0000007ff7b08dd4 in abort () from /lib/libc.so.6
#2  0x0000007ff7dc0cf8 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x0000007ff7dbe85c in ?? () from /usr/lib/libstdc++.so.6
#4  0x0000007ff7dbe8c0 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x0000007ff7dbebb0 in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x00000000004127c0 in mqtt::async_client::reconnect() ()
#7  0x0000000000406210 in publisher_func(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>) ()
#8  0x000000000040f288 in void std::__invoke_impl<void, void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> >(std::__invoke_other, void (*&&)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>&&, std::shared_ptr<multithr_counter>&&) ()
#9  0x000000000040f174 in std::__invoke_result<void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> >::type std::__invoke<void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> >(void (*&&)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>&&, std::shared_ptr<multithr_counter>&&) ()
#10 0x000000000040f10c in void std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> > >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) ()
#11 0x000000000040f098 in std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> > >::operator()() ()
#12 0x000000000040ec58 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter>), std::shared_ptr<mqtt::async_client>, std::shared_ptr<multithr_counter> > > >::_M_run() ()
#13 0x0000007ff7de9eec in ?? () from /usr/lib/libstdc++.so.6
#14 0x0000007ff7fa5478 in ?? () from /lib/libpthread.so.0
#15 0x0000007ff7bb4c1c in ?? () from /lib/libc.so.6

Additional Context

I have a callback for when the network is disconnected, and then reconnected. If I don't force the MQTT client to disconnect, it will never update to think it is disconnected, and therefore never retry connecting. As a result, I want to make sure that I do reconnect. I was hoping to avoid re-instantiating all of my options again with a standard client->connect() but if that is the suggested/correct approach, I can do that.

fpagliughi commented 9 months ago

What version of the Paho C client are you using? Something that sounds just like this was recently fixed in the C v1.3.13 client.

fpagliughi commented 9 months ago

Oh, actually, this may be different. What platform? What compiler/version?

I can’t say that the expected behavior would be exactly what you want. Perhaps the library might ignore an explicit request to reconnect if the automatic option is turned on? I’d have to look at the implementation.

But either way, it definitely should never hard crash.

lizziemac commented 9 months ago

As of this morning, I've reproduced it on two separate OS's, both on ARM platforms. If you want to reproduce, I just modified the multithr_pub_sub.cpp example (attached with my changes) - multithr_pubsub.txt

MacOS (just tested this morning)

Linux (initial PR) - Debian Bullseye

fpagliughi commented 9 months ago

Awesome, thanks. I'll have a look. Probably over the weekend.

But in general, think what might work better for you - automatic or manual updates. It's not too tough to do it manually; many of the example apps demonstrate it. But the auto/backoff algorithm is usable in most situations.

lizziemac commented 9 months ago

Sounds good, thanks! I was having some issues with the connection_lost callback being called for manual (yet autoreconnect working) but I'll take another stab at it.

fpagliughi commented 9 months ago

I haven't figured out if the C lib fires the "connection lost" callback if it gets a clean disconnect from the server (i.e. receives a DISCONNECT packet). At first I assumed it did, then I thought it didn't. Now I'm not sure. I'm currently trying to set up the test broker to figure it out and/or check the C code to confirm.

That means you may need to set an on_disconnect handler via set_disconnected_handler() to make sure you're always being informed when the connection is lost.

I'll confirm ASAP, since I need to resolve #458.

lizziemac commented 9 months ago

Thanks! I'll look into implementing that