eMBMasterPoll(371): Receive buffer initialization fail (IDFGH-9193)

espressif / esp-modbus

ESP-Modbus - the officially suppported library for Modbus protocol (serial RS485 + TCP over WiFi or Ethernet).

Apache License 2.0

85 stars 46 forks source link

eMBMasterPoll(371): Receive buffer initialization fail (IDFGH-9193) #18

Closed ZanellaSimone closed 1 year ago

ZanellaSimone commented 1 year ago

IDE: Visual Studio Code with ESP-IDF (v5.0) and ESP-MODBUS library (v1.0.7) Chip used: ESP32S3 Operating System: Windows Port master configuration: UART2, rtu mode, 19200bps, even parity

Problem Description

With 1.0.7 version of the esp-modbus library I have the problem that if some spurious message is received from the serial line during initialization phase of my application, the communication is interrupted and never resumed. I have the specific problem into function eMBMasterPoll with the checks MB_PORT_CHECK(ucMBRcvFrame, MB_EILLSTATE, "Receive buffer initialization fail.") and MB_PORT_CHECK(ucMBSendFrame, MB_EILLSTATE, "Send buffer initialization fail."). It seems that if the serial receives a message that contains something that is not consistent with what the serial expects, the communication is interrupted and is not restored. if I comment these two checks I see that the library works correctly again. Does anyone have an idea as to how to fix it?

eespi commented 1 year ago

Hi, same issue here with version 1.08. The reason (according to my findings) is that the eMBMasterPoll will block forever wating for MB_EVENT_POLL_MASK event in case of a receive error. This is what i've been tracking so far: In function eMBMasterPoll:

else if ( MB_PORT_CHECK_EVENT( eEvent, EV_MASTER_FRAME_RECEIVED ) ) {
                if (xTransactionIsActive) {
                    eStatus = peMBMasterFrameReceiveCur( &ucRcvAddress, &ucMBRcvFrame, &usLength);
                    MB_PORT_CHECK(ucMBRcvFrame, MB_EILLSTATE, "Receive buffer initialization fail.");
                    MB_PORT_CHECK(ucMBSendFrame, MB_EILLSTATE, "Send buffer initialization fail.");

The function peMBMasterFrameReceiveCur is binded to eMBMasterRTUReceive as I'm using RTU Master, which can return MB_EIO in case something bad happens (like length or crc check fail).

eMBErrorCode
eMBMasterRTUReceive( UCHAR * pucRcvAddress, UCHAR ** pucFrame, USHORT * pusLength )
{
    eMBErrorCode    eStatus = MB_ENOERR;
    UCHAR          *pucMBRTUFrame = ( UCHAR* ) ucMasterRTURcvBuf;
    USHORT          usFrameLength = usMasterRcvBufferPos;

    if( xMBMasterPortSerialGetResponse( &pucMBRTUFrame, &usFrameLength ) == FALSE )
    {
        return MB_EIO;<------------------------ HERE
    }

    ENTER_CRITICAL_SECTION(  );
    assert( usFrameLength < MB_SER_PDU_SIZE_MAX );
    assert( pucMBRTUFrame );

    /* Length and CRC check */
    if( ( usFrameLength >= MB_RTU_SER_PDU_SIZE_MIN )
        && ( usMBCRC16( ( UCHAR * ) pucMBRTUFrame, usFrameLength ) == 0 ) )
    {
        /* Save the address field. All frames are passed to the upper layer
         * and the decision if a frame is used is done there.
         */
        *pucRcvAddress = pucMBRTUFrame[MB_SER_PDU_ADDR_OFF];

        /* Total length of Modbus-PDU is Modbus-Serial-Line-PDU minus
         * size of address field and CRC checksum.
         */
        *pusLength = ( USHORT )( usFrameLength - MB_SER_PDU_PDU_OFF - MB_SER_PDU_SIZE_CRC );

        /* Return the start of the Modbus PDU to the caller. */
        *pucFrame = ( UCHAR * ) & pucMBRTUFrame[MB_SER_PDU_PDU_OFF];
    }
    else
    {
        eStatus = MB_EIO; <------------------------ HERE
    }

    EXIT_CRITICAL_SECTION(  );
    return eStatus;
}

If this is the case, ucMBRcvFrameis NULL and MB_PORT_CHECKwill fail causing eMBMasterPoll to return without generating any event with xMBMasterPortEventPost.

When this happens, the next invocation of eMBMasterPoll will stuck forever in xMBMasterPortEventGet

eMBErrorCode
eMBMasterPoll( void )
{
    static UCHAR    *ucMBSendFrame = NULL;
    static UCHAR    *ucMBRcvFrame = NULL;
    static UCHAR    ucRcvAddress;
    static UCHAR    ucFunctionCode;
    static USHORT   usLength;
    static eMBException eException;
    static BOOL     xTransactionIsActive = FALSE;
    int             i;
    int             j;
    eMBErrorCode    eStatus = MB_ENOERR;
    eMBMasterEventType    eEvent;
    eMBMasterErrorEventType errorType;

    /* Check if the protocol stack is ready. */
    if( eMBState != STATE_ENABLED )
    {
        return MB_EILLSTATE;
    }

    /* Check if there is a event available. If not return control to caller.
     * Otherwise we will handle the event. */
    if ( xMBMasterPortEventGet( &eEvent ) == TRUE ) <------------------------ HERE GETS STUCK FOREVER
    {

because the function xEventGroupWaitBitsinside xMBMasterPortEventGetwill wait forever as no event is getting received.

BOOL
xMBMasterPortEventGet( eMBMasterEventType* eEvent )
{
    EventBits_t uxBits;
    BOOL    xEventHappened = FALSE;
    uxBits = xEventGroupWaitBits( xEventGroupMasterHdl, // The event group being tested. 
                                    MB_EVENT_POLL_MASK, // The bits within the event group to wait for.
                                    pdTRUE,             // Masked bits should be cleared before returning.
                                    pdFALSE,            // Don't wait for both bits, either bit will do.
                                    portMAX_DELAY);     // Wait forever for either bit to be set. <------------STUCK FOREVER

This happens to me everytime I get some bad data in modbus. I believe that there are some events missing when that happens, but didn't have time to try some patch. Any idea?

alisitsyn commented 1 year ago

Hi,

Thank you for the issue. As per Modbus spec only the master can initiate the transaction and, slaves are listening and responding as required. The issue usually happens when the master gets incorrect data during the initialization phase. As an example, if there are other devices in the network that can initiate transactions. Once the first correct frame is received, the master can communicate normally. However, the master should still be able to communicate once the initialization is completed, even if incorrect data is coming from the communication channel.

This happens to me every time I get some bad data in modbus. I believe that there are some events missing.

The master is able to ignore the first frame received in the buffer if the initialization phase is not completed correctly. After  completing this phase, the FSM goes to EV_MASTER_READY state and is then able to communicate properly. This behavior is disabled intentionally as workaround for other issues. When it is enabled, the master send is waiting while the new data comes to the buffer during initialization stage and can not send data and waiting for the completion of the initialization.

As a temporary solution, I propose to remove the line MB_PORT_CHECK(ucMBRcvFrame, MB_EILLSTATE, "Receive buffer initialization fail."); to be able to communicate in spite of errors. I will test possible variants and will update you ASAP.

eespi commented 1 year ago

Hi,

thanks for reply. For my case I can confirm that the problem happens at the initialization phase, as my slave device supports two protocols (proprietary and modbus) and until the slave has not locked the desired master protocol, it will send some sync packets of the proprietary protocol on the serial port at specific time intervals. When this happens at initialization stage, the modbus library will enter the waiting stage and will stay there forever. In the meanwhile I will remove the line as suggested, and implement a timestamp based parachute mechanism to restart modbus in case this happens.

Thanks for the support

alisitsyn commented 1 year ago

@ZanellaSimone, @eespi,

Please check the current patch below to fix the issue:

fix_stuck_in_init_on_incorrect_data_input.patch.log

@eespi,

This is up to you to use this approach which could work but it is not recommended.

ZanellaSimone commented 1 year ago

Hi @alisitsyn,

thanks for the patch! It works very good! I have new issue into function eMBMasterPoll. I send requests to read single holding register, but when I receive the response the following error occures: E (10345) MB_PORT_COMMON: Drop incorrect frame, receive_func(4) != send_func(3)

This type of error occurs when the library function esp_restart() is invoked in my application to perform a soft reset after the expiration of a timeout. Before invoking this soft reset the communication works correctly.

This is what i've been tracking into eMBMasterPoll function:

    if ( ( eStatus == MB_ENOERR ) && ( ( ucRcvAddress == ucMBMasterGetDestAddress() )
                                    || ( ucRcvAddress == MB_TCP_PSEUDO_ADDRESS) ) ) {
        if ( ( ucMBRcvFrame[MB_PDU_FUNC_OFF]  & ~MB_FUNC_ERROR ) == ( ucMBSendFrame[MB_PDU_FUNC_OFF] ) ) {  <------------------ HERE GETS ERROR ON MODBUS FUNCTION INTO RESPONSE FRAME
            ESP_LOGD(MB_PORT_TAG, "%s: Packet data received successfully (%u).", __func__, eStatus);
            ESP_LOG_BUFFER_HEX_LEVEL("POLL receive buffer", (void*)ucMBRcvFrame, (uint16_t)usLength, ESP_LOG_DEBUG);

            ( void ) xMBMasterPortEventPost( EV_MASTER_EXECUTE );
        } else {
            ESP_LOGE( MB_PORT_TAG, "Drop incorrect frame, receive_func(%u) != send_func(%u)",   
                            ucMBRcvFrame[MB_PDU_FUNC_OFF], ucMBSendFrame[MB_PDU_FUNC_OFF]);
            vMBMasterSetErrorType(EV_ERROR_RECEIVE_DATA);
            ( void ) xMBMasterPortEventPost( EV_MASTER_ERROR_PROCESS );
        }
    } else {
        vMBMasterSetErrorType(EV_ERROR_RECEIVE_DATA);
        ( void ) xMBMasterPortEventPost( EV_MASTER_ERROR_PROCESS );
        ESP_LOGD( MB_PORT_TAG, "%s: Packet data receive failed (addr=%u)(%u).",
                               __func__, ucRcvAddress, eStatus);
    }

This happens to me everytime and to return to having a working communication I have to perform a power cycle of my system. Any idea?

alisitsyn commented 1 year ago

Hi @ZanellaSimone,

Are you using the same UART for Modbus and update your application? I can not reproduce the exact situation as you have because do not have enough information about your environment and can just guess and use some expectation. The questions are here .

In order to solve the issue I need your help to follow the steps below:

Please rename the managed_components folder in you root project directory to components. This will allow to apply patch to component downloaded by component manager.
Apply the workaround in the patch below to the components/espressif__esp-modbus folder.
Rebuild you project idf.py fullclean, idf.py build.
Try to reproduce the issue again.
Report the results here.

The results will help to continue with the issue.

Thanks. fix_message_processing_after_tout.patch.log

alisitsyn commented 1 year ago

Hi @eespi,

Could you help to test the workaround mentioned above? Thank you.

ZanellaSimone commented 1 year ago

Hello @alisitsyn, I'm using UART2 as master for modbus communication using the following configuration parameters:

UART2
rtu mode
19200bps
even parity

I applied the patch you provided following the steps above. I'm currently using VsCode with the platformio extension. I applied the equivalent of the instructions you told me referring to the expressif-idf.

Unfortunately I have to tell you that even making the changes you indicated to me does not solve the problem. When the timeout is reached, the soft reset is called and at the next restart of the application the modbus communication does not resume. Below I attach the debug of my application after the soft restart.

Debug Logs

debug.txt

alisitsyn commented 1 year ago

Hello @ZanellaSimone,

Is this possible to get whole project from you including the patched esp-modbus folder? I am trying to reproduce the issue but unsuccessfully. Even when I call the esp_restart() after some communication timeout, the modbus can restart and work normally after this. Your log looks like the component folder was not patched completely (the log itself seems incomplete). The communication log on the bus can also be very useful to find the reason for this issue.

The E (10345) MB_PORT_COMMON: Drop incorrect frame, receive_func(4) != send_func(3) message means that the master sends the request command 3- read multiple holding registers, but receives the correct response from slave with the command = 4 (read input registers). The command 4 can be related to your last request before restart if you have the input registers configured in your master data dictionary (incorrect reset of UART periph). This is why I also was asking if you have other active devices on the bus (could be the reason as well).

Update: Please check the value of CONFIG_FMB_PORT_TASK_PRIO and set it to < 18. This can be a reason why the modbus stack is not initialized correctly after restart.

Thank you.

ZanellaSimone commented 1 year ago

Hi @alisitsyn, I share you the source code of my project.

Development setup

IDE: Visual Studio Code with Platformio extension (v2.5.5) Chip used: ESP32 IDF version: framework-espidf @ 3.40200.210118 (4.2.0) Compiler version: toolchain-xtensa32 @ 2.80400.210211 (8.4.0) Operating System: Windows

Instruction to compile project

Download and install Visual Studio Code. Install into Visual Studio Code the platformio extension (v.2.5.5) Open my project (you find it into "Attacched files" section). I am attaching below the folder of the esp-modbus library (v.1.0.8). This must be replaced with the following path to ensure a setup identical to the one I used : C:\Users\USERNAME\.platformio\packages\framework-espidf\components This was downloaded from the component repository (https://components.espressif.com/components/espressif/esp-modbus). The two patches you indicated in this issue have been applied to the downloaded files. With platformio buttons in the "State bar" of Visual Studio Code clean and compile project.

Attached files

Below I attach the source code of my project and a debug log where the problem is highlighted. Into my project folder you can find the sdkconfig file too.

master_modbus_comm.zip debug_log.txt esp-modbus.zip

alisitsyn commented 1 year ago

Hi @ZanellaSimone ,

I checked your project with logs and have following results and recommendations:

As per your last log the modbus initialization works just fine after call to esp_restart() does not cause any issues as in your previous log and no any issues with peripheral. The esp_modbus latest patches applied correctly (skip tcp code).

I performed check and testing of your project with certified modbus slave emulator. The changes are: changed UART pin numbers, code uses RS485 interface with RTS line instead pure UART, use esp-idf v5.0. Results: a) The code as is able to read registers without any problems during several hours. b) No errors at all. c) The stack is not effected by spurious messages on the communication line and can recover from them (tested additionally). d) tried to initiate crc errors in slave response and timeouts for some requests. The master is able to recover and read data of next parameters correctly. I can not reproduce the errors similar to your log but seems it is possible in some conditions. The error like below

[MODBUS]: get_modbus_value() cid: 7, Input data 4(I4) = 77
[MODBUS]: modbusTask() Error update runtime data!
W (6454) MB_PORT_COMMON: eMBMasterPoll:EV_MASTER_FRAME_SENT
W (6474) MB_PORT_COMMON: eMBMasterPoll:EV_MASTER_FRAME_RECEIVED
E (6474) MB_PORT_COMMON: Drop incorrect frame, receive_func(4) != send_func(3) <<<<< <<<<<<<<the response timeout triggered but the slave sends the response later and master ignores this triggering an error.
E (6484) MB_CONTROLLER_MASTER: mbc_master_get_parameter(76): Master get parameter failure, error=(0x108) (ESP_ERR_INVALID_RESPONSE).
[MODBUS]: get_modbus_value() Update failed for cid: 0, Hold data 1(H1) = 0, ESP_ERR_INVALID_RESPONSE

corresponds the situation discussed here and master can recover from the errors using the recommendations there. The exception is REQ2 = cmd4, REQ3= cmd3. As per log after the errors the stack is able to read following parameters. The errors happen and the reason is usually related to slave response timeout or communication failures on the bus.

The recommendations to recover from failures:

Set CONFIG_FMB_TIMER_USE_ISR_DISPATCH_METHOD=y, CONFIG_FMB_TIMER_PORT_ENABLED=n in kconfig menu.
Try to read the data as the structure not just one individual register this allow to avoid register shifting issue which is possible due to protocol specifics.
Try to set the slave CONFIG_FMB_MASTER_TIMEOUT_MS_RESPOND=xxx to the maximum slave response time measured.
Once the error ESP_ERR_INVALID_RESPONSE or ESP_ERR_INVALID_TIMEOUT call the MBMasterRxFlush(); then try to repeat transaction again to repair from failure. For reliable reading this recover cycle should be performed for each read/write using mbc_master_get_parameter()/mbc_master_set_parameter() API. See the example from here.

Unfortunately, the current code is still reading the register by register, do not perform retries on fail for the same parameter. Please try to follow the recommendations above and here to solve the issues. See the communication logs attached below. test_master-slave_comm_rs485.zip

ZanellaSimone commented 1 year ago

Hello @alisitsyn, I followed all your instructions for configuring modbus communication into sdkconfig file. I then also took into consideration the example you highlighted for me. Looking at the code of your example, however, I see that you are using only holding type registers. To be in the same situation as you, I also replicated in my code, the fact of reading are registers of this type (holding registers). If I only use holding type registers I can confirm that the problem does not verify. Instead the problemE (10345) MB_PORT_COMMON: Drop incorrect frame, receive_func(4) != send_func(3) occurs to me when I also add readings input type register. Have you tried this setup? I think it is enough to add at the end of the two registers that you already read new input type registers and the problem occurs when restarting after a soft reset. Can you try this configuration?

alisitsyn commented 1 year ago

Hello @ZanellaSimone ,

Thank you for pointing me to this aspect. I tried this configuration and intentionally implemented it in the example. You are right about the first error line when the slave enabled again after restart the E (567) MB_PORT_COMMON: Drop incorrect frame, receive_func(3) != send_func(4). I can not debug and explain why it says about receive command == 3 for now. Let us keep this question open for now because I need to complete some other things first. The most important thing is that once the slave is enabled again after esp_restart(), the master can repair from any error (can confirm in my results). I tried even stress conditions sending modbus incorrect frames with correct CRC at any time and also performed some other scenarios. In all these cases the master can communicate normally after error and even can repair from error doing retries in the same read/write cycle.

I will check this intentionally later but I think it is not major issue for now due to described above. Would you agree?

ZanellaSimone commented 1 year ago

Hi @alisitsyn, I was analyzing in more detail what is the problem in my application. Unfortunately, unlike you, I don't have control over the modbus communication slave and therefore I can only analyze what happens on the master. After the esp_restart() I followed what you advised me, i.e. add some retry on the function used to read a modbus register (I saw this in the example you shared with me). What I see is that when my application is restarted, communication resumes and remains active. When the error occurs, discard the frame and try again. The real problem would seem to be that communication "goes crazy". I see that the value read for example on the first register (with the help of retry) is then appended to the next register which should instead have a different value. This means that I find all the values shifted downwards. Do you have any idea how i can stop the master requests and totally clean up the uart before applying a soft reset? This would allow me to resume communication after reboot in a clean situation without having anything dangling from before.

alisitsyn commented 1 year ago

Hi @ZanellaSimone ,

I did not see your respond on time.

I was analyzing in more detail what is the problem in my application. Unfortunately, unlike you, I don't have control over the modbus communication slave and therefore I can only analyze what happens on the master.

I understand this and try to help but I don't have your slave to check it. The communication on my side even in worst conditions can be recovered. The way I propose is to use the reference project and store the serial line communication log with timestamps to file. This will allow to analyze what happens with master. Please also try to read/write the same registers using the modbus poll tool. This would also help to save the communication log.

I see that the value read for example on the first register (with the help of retry) is then appended to the next register which should instead have a different value. This means that I find all the values shifted downwards.

Could you use the esp-idf project I posted you to check the communication instead of your project? I was trying to adopt it for your slave so, the changes will be very simple. Please also check if the patches posted above applied to the esp-modbus component folder. We need the same (reference project) to make sure we use the same code of the app and component.

Do you have any idea how i can stop the master requests and totally clean up the uart before applying a soft reset? This would allow me to resume communication after reboot in a clean situation without having anything dangling from before.

You can insert the delay of at least one second. Reset the buffer using the flush function. Call the mbc_master_destroy(), delay the time then Initialize the modbus communication again. Check the communication again and store your log.

I see that the value read for example on the first register (with the help of retry) is then appended to the next register which should instead have a different value. This means that I find all the values shifted downwards.

Could you check if you read several registers in one request instead of one by one? This should not happen in this case.

eespi commented 1 year ago

Hello @alisitsyn,

I think that it could be something related to ESP-IDF version, I have been tracking down an issue i was having to UART TX FIFO not flushing after esp_restart(). In my case I was using esp-idf 4.2.0, after patching to 4.2.4 the problem got fixed. What I believe is that there is a modbus request that is kept in the txfifo, so after an esp_restart the esp-modbus FSM goes out of sync when the first request is sent and a reply is received. It looks like @ZanellaSimone is using the same version I had, so I suggest he tries to move to a fixed version and try again. Maybe this doesn't solve his problem, but could help

alisitsyn commented 1 year ago

Hello @eespi,

Thank you for your investigation. I think this can be a reason for this issue. It looks like the DPORT_UART_MEM_RST fix is required.

Could you let me know the your exact commit id of the esp-idf 4.2.0 with the failure?

eespi commented 1 year ago

Hello @alisitsyn,

it was Platformio, version 3.0.0 which points to idf 4.2.0. By looking at the source code, the fix was not there. Eventually moved to an official esp-idf sdk 4.2.4

Regards

alisitsyn commented 1 year ago

Hello @ZanellaSimone ,

Could you check the issue is fixed in your code with the updated PlatformIO that corresponds to esp-idf v4.2.4? The esp-modbus v1.0.10 has been updated to fix all the issues that have been mentioned by you. Thanks.

ZanellaSimone commented 1 year ago

Hello @alisitsyn, Thanks for the answer. I moved to an official esp-idf sdk 4.2.4 and check that everything works correctly in my application.

alisitsyn commented 1 year ago

Hello @ZanellaSimone,

The issue above fixed in v1.0.11. If it is possible on your side, please check the fix. I am going to close this issue. Feel free to reopen it if it still happens in your project. Thanks.