espressif / esp-mqtt

ESP32 mqtt component
Apache License 2.0
591 stars 254 forks source link

esp_mqtt_dispatch_custom_event: esp_event_post_to sometimes times out (IDFGH-8576) #240

Closed someburner closed 1 year ago

someburner commented 1 year ago

I'm trying to get the new esp_mqtt_dispatch_custom_event feature to work to be able to interact with esp-mqtt safely and asynchronously, but I noticed sometimes I get a timeout err returned from esp_mqtt_dispatch_custom_event.

Is there a reason the ticksToWait param of esp_event_post_to is set to 0?

In my case I'm sending the event fairly quickly after getting an MQTT_EVENT_CONNECTED event. I send my own event to a queue in another thread which then calls the esp_mqtt_dispatch_custom_event.

If ticksToWait must be 0 then it's going to be really nasty. Although it appears esp_transport write doesn't block the same way it did the last time I was visiting this issue.

david-cermak commented 1 year ago

Would it help if you increase MQTT_EVENT_QUEUE_SIZE ? (Sorry this hasn't been made a config option yet)

Is there a reason the ticksToWait param of esp_event_post_to is set to 0?

I guess we can add it to the function parameter lists if needed, this wasn't referenced from any IDFs v5.x, so we could still adjust the API. I think the idea was to avoid potential deadlocks (as if the queue is full, we'd have to wait for mqtt task that might have a dependency on this user's api)

someburner commented 1 year ago

my MQTT_EVENT_QUEUE_SIZE is already at 10. In this case I'm dispatching 2 user events that perform subscribe calls once they're run by event loop. Maybe I really am hitting 10, I'll have to add some debugs, but seems like the timeout error might not be due to the queue being full..

I think the idea was to avoid potential deadlocks (as if the queue is full, we'd have to wait for mqtt task that might have a dependency on this user's api)

I figured that was why but wanted to check. In my case I'm using esp_event_post_to for all the exposed esp-mqtt APIs, allowing me to disable API locks, which would avoid deadlock. I'll try to see if it's really due to the queue being full. If not it might be good for that parameter to be overridden.

Mainly I wanted to see if there are any pitfalls in this approach (besides the obvious of making the user event take a long time or perform other blocking actions). E.g. If I sent a subscribe while there was already a disconnect in the queue that was not yet processed. In that case subscribe would still return an error in the user event callback, and I always re-subscribe after a connected event, but maybe you can chime in on other pitfalls?

So far it has been working smoothly and am actually really happy with the outbox implementation as is- it works really well. I have a router with netem and using tc to introduce delay, reordering, packet corruption, etc, and my qos1 packets go through perfectly.

david-cermak commented 1 year ago

This API was added to IDF in https://github.com/espressif/esp-idf/commit/4ec9d4bad60e0173c844a17c78988517f21b4cba so we're not able to update it (at least not in the IDF's internal mqtt component) and add the timeout parameter.

Closing as I don't think there's any other issue, or a pitfall in this approach.