eclipse / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
8.93k stars 2.37k forks source link

Blocking bridge-connection issue #1530

Open bwilms opened 4 years ago

bwilms commented 4 years ago

We encountered an issue with the blocking tcp-connection used for the bridge-connections. If the remote broker (bridge-host) is not available (host offline), the tcp connect()-call waits for the timeout and the complete broker hangs for this time (20sec.) As mosquitto tries to reconnect immediatly after the failed connect, the broker is not longer usable. Specially if you have defined more than one bridge-connection.

Attached you'll find a sample setup to reproduce.

  1. Start two brokers with attached config: mosquitto -c broker1.conf startet Broker auf localhost:8001 mosquitto -c broker2.conf startet Broker auf localhost 8002

  2. As long as the 3rd host of the bridge-configuration is not available you will encounter hang-ups of around 20s.

  3. If you remove the 3rd bridge or even point it to a host that is available (must not have a broker running) everything runs smooth.

In our opinion, the way how to create the bridge-connection must be switched back to "non-blocking" but the connection must wait for the connect to be finished or failing. Another solution would be to put the bridge-connections in a extra thread and not to hassle the main-thread.

Tested and reproduced on Windows with version 1.6.5/1.6.7 Version 1.6.4 doesn't show this behaviour as up to this version it was using non-blocking connections, but has the problem as mentioned in #478

sampleBridgeConfigs.zip

michaeliu commented 4 years ago

Hi,

I am not sure if this is the root cause: bridge connectiong will be block by DNS lookup and/or connection establish.

If that it, please help review my latest PR #1588. build with WITH_ADNS support may fix it.

natcl commented 3 years ago

Hello any updates on this issue ? We're seeing the same behaviour. Thanks !

REBAUD commented 3 years ago

Hi,

i'm also encoutering the same behavior on windows with version 1.6.9, and 1.6.12.

Architecture : I have

example of the conf file used

port 1885
log_type all

connection Bridge_3
address 172.31.1.214:1883
keepalive_interval 5
restart_timeout 1
notifications true 

topic # both 0 

Issue : The connection with distant broker can be lost (and no impact has to be seen on local communications), but i encounter some blocking issue:

=> When a disconnection to distant broker happened, and so local broker try to reconnect:

This behavior is really blocking for my use, because i need a working local broker all the times, even when the distant broker is not available.

It looks like a threading issue, like if the bridge connection is in the main thread, and so impact the local behavior.

Do you know this issue, and do you have a way to handle it, or a work in progress?

=> the behavior with version 1.6.4 is different, but still not acceptable, because freeze also happen (smaller)

have a good day

logfile.txt

lesmoutonssauvages commented 3 years ago

Hi, We are seeing the same too. Thanks!

natcl commented 3 years ago

Hello, any news on this issue? Thanks!

natcl commented 2 years ago

Anyone knows if this has been addressed in newer mosquitto releases ?

viulian commented 1 year ago

I am running mosquitto 2.0.15 on my router and it has a bridge to one of the VMs that can do the heavy lifting. However, when the router is rebooting, mosquitto fails to connect - it just sends CONNECT and never retries:

2023-05-05T22:07:38: Bridge netgear-r7800.biggie sending CONNECT
2023-05-05T22:07:39: No will message specified.

Restarting it makes it work again:

2023-05-05T22:12:41: Bridge netgear-r7800.biggie sending CONNECT
2023-05-05T22:12:41: Received CONNACK on connection local.netgear-r7800.biggie.
2023-05-05T22:12:41: Bridge local.netgear-r7800.biggie sending UNSUBSCRIBE (Mid: 2, Topic: #)
2023-05-05T22:12:41: Received PUBACK from local.netgear-r7800.biggie (Mid: 1, RC:0)
2023-05-05T22:12:41: Received UNSUBACK from local.netgear-r7800.biggie

It is set to connect after 10 seconds and keep trying after 30 seconds, but it doesn't seem to. Although, it does accept messages from local clients, it doesn't retry to reconnect if it didn't get a CONNECT answer.

I have added some logging before mosquitto is started - the VM is reachable (it responds to ping). This doesn't seem to be a problem of connectivity - but adding a sleep 10 just before starting it - it seems it is more likely to connect - so it may be a connectivity issue after all.