alejoseb / Modbus-STM32-HAL-FreeRTOS

Modbus TCP and RTU, Master and Slave for STM32 using Cube HAL and FreeRTOS
GNU Lesser General Public License v2.1
539 stars 183 forks source link

1 Master and Multiple Slave Error #17

Closed emrelio closed 3 years ago

emrelio commented 3 years ago

Hi! First of all, thank you for this library! It is working very well as Modbus Master.

1 Master and 1 Slave is working as expected.

However when I try to query from multiple slave devices, randomly one of the slaves is crashing.

After some time, one of these devices is not responding.

When I try to communicate with the Modbus Slave Desktop app, it is working as expected. I also tried different Modbus Slave Library(eziya/STM32_HAL_FREEMODBUS_RTU) and your Master library, it is working as expected too.

I have changed the variable names for easy to understand. And added some code pieces for my application. But when I try to use your original library, the problem is still occurring.

Here is the Master Init

  ModbusMaster.Mb_Type = MASTER_RTU;
  ModbusMaster.port =  &huart2;
  ModbusMaster.Mb_ID = 0;
  ModbusMaster.EN_Port = GPIOA;
  ModbusMaster.EN_Pin = GPIO_PIN_1;
  ModbusMaster.LED_Port = GPIOA;
  ModbusMaster.LED_Pin = GPIO_PIN_15;
  ModbusMaster.MB_Regs = slave1_r;
  ModbusMaster.Mb_Reg_Size = sizeof(slave1_r)/sizeof(slave1_r[0]);
  ModbusMaster.Mb_Timer_Period = 500;

  ModbusInit(&ModbusMaster);
  ModbusStart(&ModbusMaster);

Telegram assignments

mb_master_function[0].Mb_M_Slave_ID = 10;                               
  mb_master_function[0].Mb_M_Function = MB_FC_READ_REGISTERS;                               
  mb_master_function[0].Mb_M_Start_Address = 0x00;              
  mb_master_function[0].Mb_M_Reg_Size = 10;                             
  mb_master_function[0].Mb_M_Regs = slave1_r;   

  mb_master_function[1].Mb_M_Slave_ID = 10;                             
  mb_master_function[1].Mb_M_Function = MB_FC_WRITE_MULTIPLE_REGISTERS;                             
  mb_master_function[1].Mb_M_Start_Address = 0x0A;              
  mb_master_function[1].Mb_M_Reg_Size = 10;                             
  mb_master_function[1].Mb_M_Regs = slave1_w;
  /*********************************************************************************************************************************/
  mb_master_function[2].Mb_M_Slave_ID = 11;                                 
  mb_master_function[2].Mb_M_Function = MB_FC_READ_REGISTERS;                                   
  mb_master_function[2].Mb_M_Start_Address = 0x00;                  
  mb_master_function[2].Mb_M_Reg_Size = 10;                                 
  mb_master_function[2].Mb_M_Regs = slave2_r;   

  mb_master_function[3].Mb_M_Slave_ID = 11;                                 
  mb_master_function[3].Mb_M_Function = MB_FC_WRITE_MULTIPLE_REGISTERS;                                 
  mb_master_function[3].Mb_M_Start_Address = 0x0A;                  
  mb_master_function[3].Mb_M_Reg_Size = 10;                                 
  mb_master_function[3].Mb_M_Regs = slave2_w;   
  /*********************************************************************************************************************************/
  mb_master_function[4].Mb_M_Slave_ID = 12;                                 
  mb_master_function[4].Mb_M_Function = MB_FC_READ_REGISTERS;                                   
  mb_master_function[4].Mb_M_Start_Address = 0x00;                  
  mb_master_function[4].Mb_M_Reg_Size = 10;                                 
  mb_master_function[4].Mb_M_Regs = slave3_r;   

  mb_master_function[5].Mb_M_Slave_ID = 12;                                 
  mb_master_function[5].Mb_M_Function = MB_FC_WRITE_MULTIPLE_REGISTERS;                                 
  mb_master_function[5].Mb_M_Start_Address = 0x0A;                  
  mb_master_function[5].Mb_M_Reg_Size = 10;                                 
  mb_master_function[5].Mb_M_Regs = slave3_w;

Master Task

error_codes[0] = Modbus_Master_Query(&ModbusMaster, mb_master_function[0], &error_counter[0]);

 osDelay(50);

 Modbus_Master_Query(&ModbusMaster, mb_master_function[1], 0);

 osDelay(250);

 /***********************************************************************************************************************/

 error_codes[1] = Modbus_Master_Query(&ModbusMaster, mb_master_function[2], &error_counter[1]);

 osDelay(50);

 Modbus_Master_Query(&ModbusMaster, mb_master_function[3], 0);

 osDelay(250);

 /***********************************************************************************************************************/

 error_codes[2] = Modbus_Master_Query(&ModbusMaster, mb_master_function[4], &error_counter[2]);

 osDelay(50);

 Modbus_Master_Query(&ModbusMaster, mb_master_function[5], 0);

 osDelay(250);

Thank you!

alejoseb commented 3 years ago

Hi, I can only provide support to the standard library. Besides, I am suspecting there is a deadlock, something similar was reported by other users. I am doing some testing, I will let you know if I find something.

It would be great if you use the standard library to validate a possible solution, that will benefit everyone.

emrelio commented 3 years ago

Yes, I'm debugging standard library (last version with tcp support) right now. Thank you for the support.

emrelio commented 3 years ago

I don't know if it is a valid solution but, RingClear() and resetting buffer size seems to work. Received buffer array shifting after some time. And it causing to stop communication due to the wrong slave ID or false CRC code(When buffer shifted).

By changing

if ( modH->u8Buffer[ID] !=  modH->u8id)   //for Modbus TCP this is not validated, user should modify accordingly if needed
{
  #if ENABLE_TCP ==1
    if(modH-> xTypeHW == TCP_HW)
    {
      netconn_close(modH->newconn);
      netconn_delete(modH->newconn);
    }
  #endif
  continue;
}

to

if ( modH->u8Buffer[ID] !=  modH->u8id)   //for Modbus TCP this is not validated, user should modify accordingly if needed
{
  #if ENABLE_TCP ==1
    if(modH-> xTypeHW == TCP_HW)
    {
      netconn_close(modH->newconn);
      netconn_delete(modH->newconn);
    }
  #endif

  RingClear(&modH->xBufferRX);
  modH->u8BufferSize = 0;
  continue;
}

this.

alejoseb commented 3 years ago

Hi, I did some testing and reached a similar conclusion, there is an issue with the ring buffer, since modbus is not crashing but the data is always shifted.

This means the ring buffer is corrupted, probably because of a race condition, i.e. a character arrives while reading the buffer, or there is a bug in the ring buffer code.

Your solution is reasonable and can be used as a work around, but a long term solution should be either controlling the interrupt of the serial port, or using thread safe RTOS primitives.

alejoseb commented 3 years ago

Hi @emrelio, Please check my latest commit d107d93882290f4a772ff63c981e8942605758d4 I added the interrupt control for the serial port, and tested the solution with 3 slaves without any issue for several hours.

You can test the latest code and close the issue if everything works as expected.

emrelio commented 3 years ago

Thank you but I can't test right now due to quarantine. I will test as soon as I can.

schweigstill commented 3 years ago

I also noticed that the Modbus RTU slave stops responding after some hours of operation. The project is based on a STM32F091VBT8. We are using all eight U(S)ARTS for different purposes; UART3 is used for Modbus. The code is based on commit 7e43b15ed66865855d089108a9e0786ad5eaa581 of 30.02.2021.

During my investigations I found the following code in sendTxBuffer():

HAL_UART_Transmit_IT(modH->port, modH->au8Buffer, modH->u8BufferSize); ulTaskNotifyTake(pdTRUE, portMAX_DELAY); //wait notification from TXE interrupt

The execution is continued regardless of the reason which caused ulTaskNotifyTake() to return. Also (a) spurious character(s) which is received in the short time between ulTaskNotifyTake() in StartTaskModbusSlave() and ulTaskNotifyTake() in sendTxBuffer() may cause this. I'm not completely sure if this will just lead to a short glich or if this can also cause major problem like the unresponsiveness mentioned before.

No other task seems to be affected by the Modbus problem, i.e. everything is working nicely and I can't see any strange things in my debugger (Lauterbach TRACE32).

alejoseb commented 3 years ago

Hi, You must use the latest commit, since I solved a race condition in the ring buffer that was corrupting its structurest. The slave will stop answering after the ring buffer is corrupted.

The commit that you are using has a confirmed bug.

Besides, to avoid spurious interrupt in the Modbus slave, you can first start the kernel and then call the ModbusStart function. This will assure that the Modbus slave will not trigger interrupts unless the Modbus slave can handle them as part of the normal execution.

emrelio commented 3 years ago

I've used your latest commit and tested with 1 Master - 3 Slave config. It's been 5 hours and there is no error!

Thank you for the support. I can close the issue now.