eclipse-mosquitto / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
9.1k stars 2.4k forks source link

No way to do non-blocking connect when using username/password auth #1816

Open detly opened 4 years ago

detly commented 4 years ago

(I'm using Mosquitto 1.6.12 from the PPA on Ubuntu 20.04 and Libuv 1.34.2.)

I'm trying to integrate the Mosquitto client API with the Libuv event loop. For the most part it's gone fine - I can get the socket with mosquitto_socket(), create a Libuv poll handle from it and get single threaded event driven code. Even the most minimal example I could write is a bit lengthy, so I've put it up on Gitlab.

The outline of what I'm doing is:

Before starting the loop:

I then start the loop and:

This all works fine, and is what's in the Gitlab project. Here's what my logging shows when I run it:

15:56:36 INFO ../main_poll_readwrite.c:188: Started MQTT bridge
15:56:36 DEBUG ../common.h:55: Client 101010101010101 sending CONNECT
15:56:36 INFO ../main_poll_readwrite.c:213: Running libuv loop
15:56:36 TRACE ../main_poll_readwrite.c:163: Handling timer
15:56:36 TRACE ../main_poll_readwrite.c:180: T: No write needed
15:56:36 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:123: Handling read on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:147: P: Waiting for write
15:56:36 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:134: Handling write on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:152: P: No write needed
15:56:36 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:123: Handling read on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:152: P: No write needed
15:56:36 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
15:56:36 TRACE ../main_poll_readwrite.c:123: Handling read on mosquitto socket
15:56:36 DEBUG ../common.h:55: Client 101010101010101 received CONNACK (0)
15:56:36 DEBUG ../common.h:19: Mosquitto connect callback called: 0
15:56:36 TRACE ../main_poll_readwrite.c:152: P: No write needed
15:56:38 TRACE ../main_poll_readwrite.c:163: Handling timer
15:56:38 TRACE ../main_poll_readwrite.c:180: T: No write needed
15:56:40 TRACE ../main_poll_readwrite.c:163: Handling timer
15:56:40 TRACE ../main_poll_readwrite.c:180: T: No write needed
15:56:42 TRACE ../main_poll_readwrite.c:163: Handling timer
15:56:42 TRACE ../main_poll_readwrite.c:180: T: No write needed

The only issue is that the mosquitto_connect() call is blocking. In theory, it can return before the connection is actually established, and usually does — I don't see the CONNACK until the first poll (write?) callback. That's fine. But if I use eg. netem under Linux to silently drop packets, the connect call seems to block indefinitely.

The docs say not to use mosquitto_connect_async() unless I'm using mosquitto_loop_start(), but then I found a comment on

1791 which says that it's fine. So I tried switching to that, but immediately encountered a problem where

My logs now look like:

16:11:02 INFO ../main_poll_readwrite.c:188: Started MQTT bridge
16:11:02 DEBUG ../common.h:55: Client 101010101010101 sending CONNECT
16:11:02 INFO ../main_poll_readwrite.c:213: Running libuv loop
16:11:02 TRACE ../main_poll_readwrite.c:163: Handling timer
16:11:02 TRACE ../main_poll_readwrite.c:175: T: Waiting for write
16:11:02 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
16:11:02 TRACE ../main_poll_readwrite.c:134: Handling write on mosquitto socket
16:11:02 TRACE ../main_poll_readwrite.c:147: P: Waiting for write
[repeats thousands of times]
16:11:11 TRACE ../main_poll_readwrite.c:108: Poll on mosquitto socket
16:11:11 DEBUG ../common.h:27: Mosquitto disconnect callback called: 19
16:11:11 ERROR ../main_poll_readwrite.c:118: Error servicing mosquitto loop: 19
16:11:11 TRACE ../main_poll_readwrite.c:134: Handling write on mosquitto socket
16:11:11 DEBUG ../common.h:27: Mosquitto disconnect callback called: 4
16:11:11 ERROR ../main_poll_readwrite.c:139: Error writing to mosquitto socket: 4

Now, my server uses TLS on port 8883 with a self-signed certificate (due to no domain name) and username/password authentication. I tried to test this issue on test.mosquitto.org but (a) I couldn't get TLS to work (maybe an expired certificate?) and (b) it doesn't allow for username/password auth. Testing with no encryption on 1883, I couldn't reproduce the issue, so I think it has something to do with one of those extra factors.

(I originally posted this on the mailing list but I'm more certain now that this is at the very least an enhancement request for a non-blocking connect() API and should go here.)

ralight commented 4 years ago

I tried using your main_poll_readwrite.c code, thanks for providing that.

For the mosquitto config I used:

listener 8883
cafile ../test/ssl/all-ca.crt
certfile ../test/ssl/server.crt
keyfile ../test/ssl/server.key
password_file pw

Those certificates are in the mosquitto repository. I configured your tool to point to all-ca.crt, localhost, and set the username and password. That worked fine.

I then tweaked main_poll_readwrite.c to use connect_async instead, and reran it with the same results:

1816

I tried generating a self-signed certificate instead, and that worked as well.

detly commented 4 years ago

Ah thanks. I'll try it with a local server and your config there and see if I can reproduce it. For the Mosquitto client were you using a release or master?

ralight commented 4 years ago

I used 1.6.12 from the Ubuntu PPA. The broker was the fixes branch I think, but really there's no difference to that and what's in 1.6.12.

detly commented 4 years ago

On Tue, 8 Sep 2020 at 08:22, Roger Light notifications@github.com wrote:

I used 1.6.12 from the Ubuntu PPA. The broker was the fixes branch I think, but really there's no difference to that and what's in 1.6.12.

I finally got around to testing it, and it works perfectly fine (a) over localhost and (b) over my LAN. It's only our remote host that triggers this behaviour. I'll debug it with our server folk, but do you know what version of LibUV you used for this?

I'm struggling to think of how any behaviour from a remote server could make Linux or LibUV think a socket is constantly in need of being written to, but it doesn't look like it's Mosquitto, so sorry for the noise!

Cheers, Jason

detly commented 4 years ago

On Thu, 17 Sep 2020 at 12:20, Jason Heeris jason.heeris@gmail.com wrote:

I finally got around to testing it, and it works perfectly fine (a) over localhost and (b) over my LAN.

15 seconds after sending this I managed to reproduce it on LAN! But it took a few tries, and might be intermittent or timing related. Also it used an older version of Mosquitto (1.4.15 from the Ubuntu 18.04.1 repos), but our server isn't actually using Mosquitto at all so I don't think it matters (or maybe it does, but would point to a similar issue in our own setup?).

This is what I see on the server (192.168.1.100):

$ mosquitto -c server_conf 1600309396: mosquitto version 1.4.15 (build date Tue, 18 Jun 2019 11:42:22 -0300) starting 1600309396: Config loaded from server_conf. 1600309396: Opening ipv4 listen socket on port 8883. 1600309396: Opening ipv6 listen socket on port 8883. 1600309396: Warning: Address family not supported by protocol 1600309398: New connection from 192.168.1.231 on port 8883. 1600309401: Socket error on client , disconnecting. 1600309409: New connection from 192.168.1.231 on port 8883. 1600309411: Socket error on client , disconnecting. 1600309416: New connection from 192.168.1.231 on port 8883. 1600309416: New client connected from 192.168.1.231 as 101010101010101 (c1, k10, u'user'). 1600309425: Socket error on client 101010101010101, disconnecting. 1600309426: New connection from 192.168.1.231 on port 8883. 1600309426: New client connected from 192.168.1.231 as 101010101010101 (c1, k10, u'user'). 1600309428: Socket error on client 101010101010101, disconnecting. 1600309429: New connection from 192.168.1.231 on port 8883. 1600309429: New client connected from 192.168.1.231 as 101010101010101 (c1, k10, u'user'). 1600309430: Socket error on client 101010101010101, disconnecting. ^C1600309433: mosquitto version 1.4.15 terminating

Note those first three connections: socket error on client . That's when, on the client side, I get that constant state of mosquitto_wants_write() == true and the socket being writeable. It seems to happen after the disconnect. Then it just... comes good. No changes, it just happens. I also saw it go the other way, in another session: good, good, good, good, error, error, good...

Anyway, if you have the time, please let me know if you can get this to happen when there's an extra hop between your client and server.

Cheers, Jason

detly commented 3 years ago

Before I try to reproduce/debug this in v2, I want to check a couple of things:

detly commented 3 years ago

Confirmed this is still an issue on 2.0.10.

calvin2021y commented 2 years ago

bring libuv & libevent support will be a good start.