eclipse / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
8.92k stars 2.37k forks source link

Crash during mosquitto_kick_client_by_clientid call #3080

Open D-r-P-3-p-p-3-r opened 2 months ago

D-r-P-3-p-p-3-r commented 2 months ago

We're experiencing crashes when calling _mosquitto_kick_client_byclientid under Mosquitto 2.14. Because of issue #3056 we're not using a newer version of Mosquitto (which probably does not matter).

Scenario: We are running a self-written Mosquitto Plug-In. This plugin keeps a white list of devices that are allowed to connect to the broker. The white list content is set by an application via an REST API. If a device is removed from the white list and it is currently connected to the broker we kick it via _mosquitto_kick_client_byclientid . This happens in a thread that is started by the plugin (the thread that handles all the plug-in‘s logic). Occasionally during the calls to _mosquitto_kick_client_byclientid the broker crashes.

I suspect it happens when concurrently with the _mosquitto_kick_client_byclientid call an ACL check is performed in the plugin via Mosquitto. Before entering the plug-in thread we're extracting information from the event_data (e.g. via mosquitto_client_id(), mosquitto_client_username(), mosquitto_client_certificate()) in Mosquittos own thread.

Is it not ok to call _mosquitto_kick_client_byclientid from a thread that is not Mosquitto's?

Or is the problem that _mosquitto_kick_client_byclientid and mosquittoclient* are called at the same time from different threads?

Or are there other rules I am not aware of?

Unlike the libmosquitto documentation which states the following

libmosquitto provides thread-safe operation, with the exception of mosquitto_lib_init which is not thread safe.

the plug-in documentation does not talk about thread safety at all.

dkelmatic commented 1 month ago

Same experience here. We authenticate via OAuth-plugin and at some point I cannot refresh the token anymore, which is fine. I`d just kick the client then and he can login again receiving a new token. We have a client that sends a burst of messages with a big payload (its pretty much a small csv-file) in an asynchronous manner. When sending messages and the token fails for whatever reason, I try to kick the client. As many messages are being sent, I see the log message that client xx will be kicked like 20 times until the crash (segfault of mosquitto) happens.

The only Idea I had so far was to slow down the messages what didnt have an effect. @D-r-P-3-p-p-3-r could you add some more detail in what circumstances the problem occurs? Does it also occur in the circumstance that a client is kicked and tries to reconnect over and over agin?

D-r-P-3-p-p-3-r commented 1 month ago

@dkelmatic: I think the problem occurs when we kick a client and at the very same time another client accesses the broker and thus some code is executed in the auth plug-in. My assumption is that another mosquitto-API is called and that it is not safe to do this at the same time mosquitto_kick_client_by_clientid is called. This matches to the description of your problem. A lot of access to the broker and then a kick. This increases the chance that the kick happens at the very same time as another operation in the broker.

Daedaluz commented 1 month ago

I think there is a tick event you can register to that runs in the brokers context. you could probably use that to poll a list of things to do e.g kick clients.

I don't remember if this is available in 2.0.14 though.

D-r-P-3-p-p-3-r commented 1 month ago

I think there is a tick event you can register to that runs in the brokers context. you could probably use that to poll a list of things to do e.g kick clients.

I don't remember if this is available in 2.0.14 though.

That sounds like a very good idea. I will try this approach. Thanks!

dkelmatic commented 1 month ago

Hey! I checked out the tick event. Looks good. What I'm not so sure about is how to implement that. If I create a global list of clients wo should be kicked on the next tick, how can I be sure to not access the list from multiple threads? It would mean to use semaphores, but how am I going to do that, if I dont wat to compile mosquitto myself?

Daedaluz commented 1 month ago

Why would you need to recompile mosquitto for that?

Just trying to follow your train of thought here, on how you arrive to this.

D-r-P-3-p-p-3-r commented 1 month ago

I can confirm that "tick" is available.

We use and build our own Mosquitto plugin. Thus, it is no problem to have a thread-safe mechanism that handles the access to the list. I implemented it using C++ sync mechanisms. Testing is still pending. I assume it's similiar for everyone else using _mosquitto_kick_client_byclientid. Where would you use that function if not in plugin code?

Why would you need to recompile mosquitto for that?

IMO you don't. You compile your plugin. See above.

dkelmatic commented 1 month ago

Maybe I'm thinking wrong, but when I use a list for the clients to be kicked on the next tick, I would fill the list from within the same thread I try to kick the clients in my current implementation. I would then kick the clients in my tick event, that has been triggered by the mosquitto main loop thread. When adding/deleting the entries from the list, I would probably have to make sure that the two threads are not accessing the list in the same time to avoid another race condition.

Wouldn't I have to set such a semaphore up within the mqoauitto main code then?

D-r-P-3-p-p-3-r commented 1 month ago

The tick event can be handled in a plugin. You can register a handler function there. No need to modify Mosquitto itself, but you extend it with a plugin.

See https://mosquitto.org/api/files/mosquitto_plugin-h.html

dkelmatic commented 1 month ago

Yes, I understand the the plugin interface. I'm using an oauth plugin. My idea would be to create a (global) list and if AuthAclCheck() fails with an invalid token, I'd add the client ID to the list. In the Tick callback, I would then read the list, kick the client and delete the clientID from the list. But can I be sure that there is no other thread that tries to write another clientID to the kick-list from another thread that runs the AuthAclCheck() function in that moment for another client?

Daedaluz commented 1 month ago

I'm not convinced mosquitto runs ACL checks in parallel with multiple threads, so maybe you don't need the mutex or semaphore.

However, just for thought, why can't the semaphore be next to the list within the plugin?

dkelmatic commented 1 month ago

Hey @Daedaluz , thanks for the inspiration. I will check what happens without any precautions and if it makes problems, I might be able to use some getter and setter functions that also handle the semaphore. For some reason I thought I would have to place the semaphore initialization within mosquitto, but you're right, it might work initializing it within the plugin.

I will report back whn I have results.

dkelmatic commented 1 month ago

Hello everyone, after updating my plugin to the V5 api, I created a linked list with the clientIDs to be kicked with mosquitto_malloc and assigned the userdata pointer to the head.

In the acl event I add clients to the list (if they are not already in there) with help of the received userdata pointer and in the tick event I kick the users.

I didn't find a safe way to reproduce my crashes with the old solution, but until now I didn't experience any crash in the testing environment with the new solution. Fingers crossed 🤞

Did you solve the issue for your plugin? @D-r-P-3-p-p-3-r ?

D-r-P-3-p-p-3-r commented 2 weeks ago

@dkelmatic Yes. The issue for my problem was solved by moving the kick call to the the tick handler.

Thus, this is a good/the solution.

It'd be great if this information would be added to the documentation. It makes totally sense that you have to do it this way. But it is not quite obvious IMO.