espressif / esp-modbus

ESP-Modbus - the officially suppported library for Modbus protocol (serial RS485 + TCP over WiFi or Ethernet).
Apache License 2.0
106 stars 49 forks source link

ESP_ERR_INVALID_RESPONSE after few days of work (IDFGH-13723) #74

Open Silvesterrr opened 2 weeks ago

Silvesterrr commented 2 weeks ago

Checklist

Issue or Suggestion Description

Hello, I switched to esp-modbus 1.0.15 from 1.0.7. Now after few days of work I get error as below:

E (872259047) MB_PORT_COMMON: 872259042655:Frame send error = 5 E (872259047) MB_CONTROLLER_MASTER: mbc_master_send_request(97): Master send request failure error=(0x108) (ESP_ERR_INVALID_RESPONSE).

The device reads from devices in repeat every 600-1000ms. After error happens it is show while trying to read any slave.

Note that after reboot everything works normally. Even if ESP_ERR_INVALID_RESPONSE occures shoud'nt it get back up?

And what means error = 5 which is MB_EIO. Is ESP_ERR_INVALID_RESPONSE the result of

MB_EIO or the other way around?

Maybe I should impement vMBMasterRxFlush() as discussed in other issue?

Im using modbus rtu master with modbus tcp slave.

here is my sdkconfig: sdkconfig.txt

alisitsyn commented 2 weeks ago

Hello @ Silvesterrr,

I need more information to identify the reason for the issue. Could you store and then send to me the bigger portion of log with the debug severity set in the kconfig menu? This should include the last logging messgaes when the error occur. I need to check how the master and slave are used in your device application.

(ESP_ERR_INVALID_RESPONSE) - means that the master RTU sent the request to the slave and got incorrect response from slave or response was fragmented because the slave responds to previous transaction when the new one is in progress. This may be due to increased slave response time which longer than the time between transactions and incorrect value of slave response time option in master = 500 ms. The log can clarify this.

Silvesterrr commented 1 week ago

Sure, I was waiting for problem to occur. It worked for ~23h and failed again. Here is a portion of log with debug log verbosity: centrala_2_18_09_2024.txt The device worked perfectly fine until line: 6185 where it received some sort of data and failed?
Generally slaves should not send any data that is that length. So that is weird.

from that moment the device can't send any more frames. I verified it with external rs485 dongle and I can confirm the device is not sending any more frames after that point.

Just to clarify behavior. In the rs485 network there are other 8 devices. My device tries to read ids 1-16 in loop. It reads 1-8 and fails to read 9-16. That is expected behavior of my code.

alisitsyn commented 1 week ago

The device worked perfectly fine until line: 6185 where it received some sort of data and failed? Generally slaves should not send any data that is that length. So that is weird.

What I can see from the log the slave response time = 500ms in your project. timestamp: 81748191 - 81749091: On some stage the slaves stop to respond properly. The master sends the request but slaves respond only right after expiration of slave respond time. The master tries to send the request but can not do this because the slaves send delayed respond. On line 6185 the situation is very similar but you get bunch of data after timeout that were not expected and they arrive right after start of new master transaction otherwise the UART buffer would be cleared. It looks like that all your slaves are become active and try to respond the same time or other master on the RS485 is active. I can just guess what happened on the bus to cause collisions and receiver gets 66 bytes right after start of transaction request. So, I suppose something happens with your segment and data bus and this may caused the issue. This needs further inspection of the code. Could you check the v2 implementation of RTU with the same conditions?

Please also try to disable the CONFIG_FMB_TIMER_PORT_ENABLED=n kconfig value, this can help. I need to reproduce this issue to confirm and fix. I will do this once have time for it. Please let me know the results of check as per above notes. Thanks.