espressif / esp-mqtt

ESP32 mqtt component
Apache License 2.0
591 stars 254 forks source link

Improve/fix handling of read errors (IDFGH-7937) #232

Closed egnor closed 3 months ago

egnor commented 1 year ago

(Edited)

This change simplifies and should correct socket error handling in the MQTT implementation for all ESP-IDF and MQTT versions, giving improved performance and correctness.

Explanation:

Previously, an error (return value <= 0) from esp_transport_read() from any locus was followed by a call to esp_mqtt_handle_transport_read_error(), whose semantics are unclear but would return 0 for closed connection OR timeout, and return -1 (and log an error) for any other error type. The calling sites would generally return error for -1, and return success (but still abandon the operation) for 0.

This was often not quite right.

Change:

Revise the error handling to make esp_mqtt_handle_transport_read_error() unconditionally describe the error and call esp_mqtt_client_dispatch_transport_error(). The one place where timeouts should be accepted (the first-byte read) wraps that call accordingly. Comments are added and error messages improved.

david-cermak commented 1 year ago

Hi @egnor

We have already branched-off of the v4.x support, just to keep the code simple(-er), with less #ifdef's.

There's a branch idf_v4.x for IDF<=v4.4 that would still accept all fixes and important updates.

egnor commented 1 year ago

@david-cermak Thank you for the quick reply. On further investigation I now think this is an improvement (bug fix) even for ESP-IDF v5. I've updated the PR description accordingly. Can you take another look and let me know if this is something that makes sense to you?

david-cermak commented 1 year ago

Sorry for the delay and thanks for updating the PR. I agree that in certain places we can (and should!) treat the timeouts as errors.

Also please update the docs of the internal function, e.g.

/*
 * Returns:
- *     -1 in case of failure
+ *     -1 in case of failure or timeout while one message reading is in progress
 *      0 if no message has been received
 *      1 if a message has been received and placed to client->mqtt_state:
 *           message length:  client->mqtt_state.message_length
 *           message content: client->mqtt_state.in_buffer
 */
static int mqtt_message_receive(esp_mqtt_client_handle_t client, int read_poll_timeout_ms)
egnor commented 1 year ago

Sorry about the long hiatus! I hope you can remember what this is all about. I've updated the PR to be against current head, and did my best to address your requests; see what you think?

david-cermak commented 3 months ago

@egnor @ESP-YJM Thanks for taking the time and effort in posting the changes and reviewing. There were indeed few issues with the mqtt client, but the changes are very much different to those you proposed, so I fixed it in a separate commit. You can check ddde502 for more details.

There were two issues: 1) Treat EOF as an error, since the connection is closed (FIN) from the server side. If we didn't we would try to read (in the next iteration) from the same socket that has been already closed and get an error.

2) Treat timeouts in the middle of MQTT message reading as errors (if timeouted for the second time and didn't read a byte)

The point 2) is a bit complicated, since the we know about the problem only in the second iteration.