eclipse / paho.mqtt.rust

paho.mqtt.rust
Other
511 stars 102 forks source link

Reconnection problem #192

Closed JosuGZ closed 9 months ago

JosuGZ commented 1 year ago

I have a peculiar problem. I can't provide a lot of information as I am unable to reproduce it.

After several months working (and surviving all sorts of reconnections), my program lost connection and never reconnected again. It is configured to always attempt to reconnect every second:

  #[cfg(feature = "ssl")]
  fn get_connect_options(&self) -> ConnectOptions {
    let mut connect_options_builder = ConnectOptionsBuilder::new();

    connect_options_builder.ssl_options(self.get_ssl_options());

    let interval = Duration::new(1,0);
    connect_options_builder.automatic_reconnect(interval, interval);

    connect_options_builder.finalize()
  }

The program was still running and responding to other kinds of IO (not MQTT). After restarting it, it connected again.

The only lead to where might be the bug is the following:

Suspecting that the MQTT thread might have crashed without crashing the whole program, I checked the thread count of the program. It was 10. After restarting it, it was 9. Every device is always using 9 threads. So de only difference between the working state and the not working state is that a new thread is created.

I suspect that paho_mqtt is creating a new thread without destroying the old one, breaking some invariant.

I will try to get more information if it happens again.

fpagliughi commented 1 year ago

Oh, that's weird... A new thread appears when it stop working?!?

JosuGZ commented 1 year ago

I have only observed that once. It might be a false lead but I think my program always use 9 threads.

I wrote it just in case it might ring a bell to someone familiar with the code. Other than that, I don't know why it stops reconnecting.

fpagliughi commented 1 year ago

The thread creation and usage is all in the C library, with which I am only slightly familiar. If a handler thread crashed and disappeared, I would understand that, especially since the callback thread flow through this library and application code. So there are a number of places that things can go wrong. But I'm not sure where a new thread would appear. It's what I call an "interesting" bug.

I'm going to start working on a small bug-fix update for this library and will look to see if I can spot anything like this.

xiaozefeng commented 1 year ago

I had the same problem, the connection disappeared after a while.

I used this example: https://github.com/eclipse/paho.mqtt.rust/blob/master/examples/async_subscribe_v5.rs

version: paho-mqtt = "0.12.0" runtime: tokio

fpagliughi commented 9 months ago

I think this is fixed with the upcoming v0.12.3 release. If you still see it with that version, please feel free to reopen.