Open hitecSmartHome opened 2 weeks ago
onReceive()
will be called only when:
1- The UART FIFO is full.
2- When RX timeout occurs.
From the comments in https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/HardwareSerial.h
// onReceive will setup a callback that will be called whenever an UART interruption occurs (UART_INTR_RXFIFO_FULL or UART_INTR_RXFIFO_TOUT)
// UART_INTR_RXFIFO_FULL interrupt triggers at UART_FULL_THRESH_DEFAULT bytes received (defined as 120 bytes by default in IDF)
// UART_INTR_RXFIFO_TOUT interrupt triggers at UART_TOUT_THRESH_DEFAULT symbols passed without any reception (defined as 10 symbols by default in IDF)
// onlyOnTimeout parameter will define how onReceive will behave:
// Default: true -- The callback will only be called when RX Timeout happens.
// Whole stream of bytes will be ready for being read on the callback function at once.
// This option may lead to Rx Overflow depending on the Rx Buffer Size and number of bytes received in the streaming
// false -- The callback will be called when FIFO reaches 120 bytes and also on RX Timeout.
// The stream of incommig bytes will be "split" into blocks of 120 bytes on each callback.
// This option avoid any sort of Rx Overflow, but leaves the UART packet reassembling work to the Application.
void onReceive(OnReceiveCb function, bool onlyOnTimeout = false);
What is the value of PACKET_TRIGGER_ONLY_ON_TIMEOUT
?
Could it be a memory allocation issue? Please provide a minimum sketch that can be used to confirm the issue and to debug it.
PACKET_TRIGGER_ONLY_ON_TIMEOUT
is true in this case. The fifo can not be full since I got messages only when asked and no message is bigger than the fifo
These are standard modbus messages
OK. I can't relate UART interrupts to flash accessing / httpd task.
From your description, it sounds like this is an Arduino as IDF component project with specific IDF sdkconfig
settings that may be very particular to this project.
Could you please provide a minimum sketch that can be built using Arduino Core standard sdkconfig
that demonstrates such issue?
What is the value of MAX_RX_BUFFER_SIZE
, MAX_MBUS_DATA_LENGTH
and MBUS_BAUD
?
OK. I can't relate UART interrupts to flash accessing / httpd task. From your description, it sounds like this is an Arduino as IDF component project with specific IDF
sdkconfig
settings that may be very particular to this project.Could you please provide a minimum sketch that can be built using Arduino Core standard
sdkconfig
that demonstrates such issue?
Could it be an issue with flash access / httpd tasks?
Arduino setup()/loop()
will run under the lowest FreeRTOS priority.
By other hand onReceive()
related task will run under priority ARDUINO_SERIAL_EVENT_TASK_PRIORITY
.
This can be set as a global #define
for the whole project.
#ifndef ARDUINO_SERIAL_EVENT_TASK_PRIORITY
#define ARDUINO_SERIAL_EVENT_TASK_PRIORITY (configMAX_PRIORITIES - 1)
#endif
You can also change the ESP32-WROVER CPU that is running the UART event Loop with ARDUINO_SERIAL_EVENT_TASK_RUNNING_CORE
#ifndef ARDUINO_SERIAL_EVENT_TASK_RUNNING_CORE
#define ARDUINO_SERIAL_EVENT_TASK_RUNNING_CORE -1
#endif
Or change the Task Stack size with ARDUINO_SERIAL_EVENT_TASK_STACK_SIZE
#ifndef ARDUINO_SERIAL_EVENT_TASK_STACK_SIZE
#define ARDUINO_SERIAL_EVENT_TASK_STACK_SIZE 2048
#endif
About stack size... onReceive()
callback will be called from a separated UART Event Task.
If the callback function overflows the Task stack, it can cause a similar issue.
#define MBUS_BAUD 115200
#define PACKET_TRIGGER_ONLY_ON_TIMEOUT true
#define MBUS_RX_TIMEOUT 1
#define MAX_MBUS_DATA_LENGTH 256
#define MAX_RX_BUFFER_SIZE 80
This is from my sdkconfig.
CONFIG_ARDUINO_SERIAL_EVENT_RUN_NO_AFFINITY=y
CONFIG_ARDUINO_SERIAL_EVENT_TASK_RUNNING_CORE=-1
CONFIG_ARDUINO_SERIAL_EVENT_TASK_STACK_SIZE=4048
CONFIG_ARDUINO_SERIAL_EVENT_TASK_PRIORITY=24
I can increase the event stack size for testing and also the rx buffer size.
Could you please provide a minimum sketch that can be built using Arduino Core standard sdkconfig that demonstrates such issue?
I will create one for sure because I want to get to the bottom of this.
Here is my whole sdkconfig: https://pastebin.com/w4wPRxDK
define MAX_RX_BUFFER_SIZE 80
Serial1.setRxBufferSize(MAX_RX_BUFFER_SIZE);
It can't be lower than 129. Please enable DEBUG level at least to INFO in order to check error and warning messages.
From HardwareSerial Source Code:
// minimum total RX Buffer size is the UART FIFO space (128 bytes for most SoC) + 1. IDF imposition.
size_t HardwareSerial::setRxBufferSize(size_t new_size) {
if (_uart) {
log_e("RX Buffer can't be resized when Serial is already running. Set it before calling begin().");
return 0;
}
if (new_size <= SOC_UART_FIFO_LEN) {
log_w("RX Buffer set to minimum value: %d.", SOC_UART_FIFO_LEN + 1); // ESP32, S2, S3 and C3 means higher than 128
new_size = SOC_UART_FIFO_LEN + 1;
}
_rxBufferSize = new_size;
return _rxBufferSize;
}
Well, I see that it changes the value back.
Firstly
MAX_RX_BUFFER_SIZE
to 256
CONFIG_ARDUINO_SERIAL_EVENT_TASK_STACK_SIZE
to 8000
DEBUG
I couldn't see shit so decreased debug back to info. I could reproduce it without flash calls.
I (117321) System: Getting task info
E (117329) Modbus: CRC error in response packet: 0x00
E (117329) Modbus: Invalid packet. Can't process it.
Raw Packet: 00
E (117334) HardwareHandler: Packet error code: 1
E (117339) Modbus: CRC error in response packet: 0x14
E (117345) Modbus: Invalid packet. Can't process it.
Raw Packet: 14 00 00 ed 85
E (117353) HardwareHandler: Packet error code: 1
The packet also has missing bytes and it is splitted. The task priority is default 19 and it is amongst the biggest priority tasks the system has.
uart_event_task is priority 24 wifi is 23 esp_timer is 22 sys_evt is 20 and arduino_events is 19
I can increase the arduino_events task further hovewer. ( I want to try these things before I go write a reproduction example because it will probably involve http requests. )
For some strange reason, in the menuconfig under Arduino configuration the Serial Event task priority shows up as 24 but when I do a check with uxTaskGetSystemState(taskStatusArray, uxArraySize, &ulTotalRunTime);
it reports the arduino_events
priority as 19. Strange...
Does Arduino Serial Event task
the same as arduino_events
?
Does
Arduino Serial Event task
the same asarduino_events
?
No. It is a separated task and settings: https://github.com/espressif/arduino-esp32/blob/0d5d50eb4186cb5ef0b7c108f87fae1cb688b84f/cores/esp32/HardwareSerial.cpp#L14-L24
So the ARDUINO_SERIAL_EVENT_TASK
name is uart_event_task
.
In this case, the uart_event_task
priority is 24 for me...
What does arduino_events
task do? Like OTA and things like that?
What does arduino_events task do? Like OTA and things like that?
It is a general Event Loop that catches most of internal Arduino Core events. Most Arduino Event Callback are based on it.
@hitecSmartHome - I have marked this issue as awaiting response
because we need a minimum sketch that demonstrates the issue in order to move forward.
Yeah, sorry Iam thinking and testing what could cause this. At first I tought that it is general flash access but it is triggered on other http requests too. I also put a lot of critical things into iram like flash functions and rtos functions as well as uart interrupts. I can definitely trigger it with http requests but iam not sure about what operation does it exactly. Sometimes it is easily triggered with some request other times the same request does nothing even with multiple tries.
We can keep this issue open for longer. Take your time. Let us know about your progress. Thanks.
Thank you very much
There is definietly something to do with the rx buffer size. If I read a lot of bytes from the slaves, let's say 32x2 bytes + header and crc so like max 80 bytes it seems to overflow something. I can see the packet second half as a separate pacage and it blows my data validator since the CRC is wrong for both half.
It doesn't matter how much I increase the RX_BUFFER_SIZE
.
I have tried to call uart_flush_input(UART_NUM_1);
on every packet so it should empty the fifo or reset the ring buffer so it can start over after every packet but it did not help.
Can this be? I added 1ms wait after my callback called but before I read the serial data and it seems to work reliably. I can read huge amounts of data and I also uncommented uart_flush_input
Serial1.onReceive(
std::bind(&Modbus::handlePacket, this),
PACKET_TRIGGER_ONLY_ON_TIMEOUT
);
void Modbus::handlePacket() {
delayMicroseconds(1000); // Added 1ms wait
int available = Serial1.available();
uint8_t rawPacket[available] = {0};
int readBytes = Serial1.readBytes(rawPacket, available);
//uart_flush_input(UART_NUM_1);
if (!isPacketValid(rawPacket, readBytes)) {
return;
}
}
Further testing required...
The data packet is sent in a single flow? What if the slave sends a "first half" of data, delays a bit and then sends the remaining "second half". It would generate an "RX Timeout" after the first half, hence it would call onReceive for this "first half".
What if the RX Timeout is set to the maximum allowed by the protocol? It could force the "onReceive callback" to be called only after the system is sure that no more "extra" data could arrive.
#define MBUS_RX_TIMEOUT 1
is used in Serial1.setRxTimeout(MBUS_RX_TIMEOUT);
This means that onReceive callback
occurs after 1 symbol timeout.
This is about 10bits (8 data bits+ 1 stopbit + 1 parity bit) on 115200 bps => 87 microseconds (short time).
Make it 1 millisecond at 115200 ==> #define MBUS_RX_TIMEOUT 12
12 symbols of 10 bits at 115200 ==> (1/115200)*10*12 = 0,00104166 seconds --> about 1ms.
I can't make MBUS_RX_TIMEOUT
more than 1 because the idf instantly prints out an error and sets it back to 1.
But I will try it again.
E (7572) uart: tout_thresh = 12 > maximum value = 1
What if the slave sends a "first half" of data, delays a bit and then sends the remaining "second half".
No. We measured with an oscilloscope and the slaves are sending the packets at once with consistent timings. We also in controll of the slaves firmware and we are sure that they are good. They use simple uart interrupts and hardware timers to send out the data.
Also sometimes there is a 0x00
at the first byte of the data which we do not see with an oscilloscope. That 0x00 sometimes there even with this 1ms delay after an onRecive callback.
00 0e 03 01 00 00 1c 45
should be 0e 03 01 00 00 1c 45
( address, func code, reg num, data first, data second, crc, crc ) registers are 16 bit and sending two 8 bit data. But that 0x00 is mistery still.
If I have this 1ms delay the rest of the data is good, I don't see any error just sometimes this 0x00 byte. uart_flush_input(UART_NUM_1);
is no help again in this situation
It is a little bit modified modbus because the slaves are not waiting for the standard response timeout but sending the response back as soon as they can in order to speed up the communication. Our system can handle 254 expander modules and there are some type of slaves which can only handle 115200 baud so we need to cut timings everywhere we can. Sometimes I see 0x00 alone without any indication about other data.
E (18958) Modbus: CRC error in response packet: 0x00
CRC error raw packet: : 00
E (18958) Modbus: Invalid packet. Can't process it.
MODBUS INVALID PACKET: 00
E (18965) HardwareHandler: Packet error code: 1
This particular packet was a single byte and no previous or next packet received in half. Interesting...
Otherwise with 1ms delay the system seems stable but slow compared to without 1ms delay. The communication is continous and we can't afford to wait on anything. The last slave should send their data as fast as it can so we don't lose packets. There can be critical data for the system to work reliable. There are short and long button push timings which can be missed if there are lot of slaves on the bus
I think I have to switch to raw uart interrupt handling and rewrite my implementation because either arduino Serial1 is acting crazy or this whole event driven approach laggs behind.
Currently the system has only the default tasks and the uart task has the highest priority in the system. I think I can't fine tune this further. It just doesn't work properly
I got one again
E (601441) Modbus: CRC error in response packet: 0x00
CRC error raw packet: : 00 0a 03 01 00 00 ed 85
E (601442) Modbus: Invalid packet. Can't process it.
MODBUS INVALID PACKET: 00 0a 03 01 00 00 ed 85
E (601450) HardwareHandler: Packet error code: 1
The first 0x00
can't be the last packet last byte since the last byte must be a crc and if it was 0x00
it would catch it with a CRC error. So I don't know where that 0x00 coming from..
If it would have been this
0a 03 01 00 00 ed
BRAKE00 0a 03 01 00 00 ed 85
The system would have reported two packet errors in this case with CRC error but it was just one so this 0x00 is a magic byte coming from deep of the uart implementation or somewhere :D
It is a little bit modified modbus because the slaves are not waiting for the standard response timeout but sending the response back as soon as they can in order to speed up the communication. Our system can handle 254 expander modules and there are some type of slaves which can only handle 115200 baud so we need to cut timings everywhere we can. Sometimes I see 0x00 alone without any indication about other data.
E (18958) Modbus: CRC error in response packet: 0x00 CRC error raw packet: : 00 E (18958) Modbus: Invalid packet. Can't process it. MODBUS INVALID PACKET: 00 E (18965) HardwareHandler: Packet error code: 1
This particular packet was a single byte and no previous or next packet received in half. Interesting...
Could it be the DE/RE pin is pulled at the wrong moment? On ESP32's you can enable collision detection for RS485. For this you need to only operate the DE pin and have the /RE pin pulled to GND all the time. Have to check the souce to see how it is called, but I think the function to enable this was almost literally called RS485CollisionDetection.
We have an rs485 chip and one controll pin. Serial is configured to use this pin Serial1.setPins(-1, -1, -1, MBUS_RTS);
the MBUS_RTS pin is the transmit pin which is raised to high on transmit to indicate it to the rs485 chip
So DE and /RE are not separated on the RS485 chip?
Maybe you could check with a scope to see if the timings of switching this are a bit too tight?
It is pulled high with a 2.2kohm
So DE and /RE are separated on the chip.
Anyway, it should be possible to see any 'collision' on the scope.
Will investigate... i dont know why my engineer pulled it to high since the pdf also says that it should be pulled to gnd
Anyway, it should be possible to see any 'collision' on the scope.
The problem is that these magic 0 bytes is so rare and it does not easily caught
Will investigate... i dont know why my engineer pulled it to high since the pdf also says that it should be pulled to gnd
Well it is the 'standard' way of connecting DE and /RE together. However this makes it impossible to use 'collision detection', which is a feature the ESP32-xx have but not all devices have.
We have an esp32 wrover - e rev 3 in theory
ESP32-classic (as I tend to call them) also has RS485 'collision detection' available as a feature. I have used it on a few of my own boards too.
Do you think this 0 byte is a collision? How? When the esp sent out all data, it pulls the pin low and waiting for response. What data collides?
Also this collision does not answers to other things like truncated packets. Sometimes I have a packet in half and no extra or missing bytes. I have observed that sometimes only the last byte is truncated only. The second half of the CRC. Will read about this collision detection feature, thank you very much.
It is just a theory as adding a delay does seem to solve your issue. So maybe the /RE pin is not held low long enough to capture all the data. With RS485 you have the added complexity of the DE/RE signal which essentially blocks any traffic in a specific direction.
I assume you let the ESP handle the DE/RE ? I don't know if this is done in hardware or in software using interrupts (could be in the SDK so you don't see it) But this could explain why underload there might be bytes missing, especially at the beginning or end of a packet.
/**
* @brief UART mode selection
*/
typedef enum {
UART_MODE_UART = 0x00, /*!< mode: regular UART mode*/
UART_MODE_RS485_HALF_DUPLEX = 0x01, /*!< mode: half duplex RS485 UART mode control by RTS pin */
UART_MODE_IRDA = 0x02, /*!< mode: IRDA UART mode*/
UART_MODE_RS485_COLLISION_DETECT = 0x03, /*!< mode: RS485 collision detection UART mode (used for test purposes)*/
UART_MODE_RS485_APP_CTRL = 0x04, /*!< mode: application control RS485 UART mode (used for test purposes)*/
} uart_mode_t;
Well ..... (used for test purposes)
I assume you let the ESP handle the DE/RE ?
Yes, I let the esp do it. That was the logical way since I also can't see the buffers
Do you think this 0 byte is a collision? How? When the esp sent out all data, it pulls the pin low and waiting for response. What data collides?
It is not really about data colliding, but more like the /RE pin not being pulled down quickly enough to read the incomming data.
Remember this is just my theory....
Board
esp32 wrover
Device Description
psram, eth, flash etc..
Hardware Configuration
psram, eth, flash etc..
Version
latest master (checkout manually)
IDE Name
PlatformIO
Operating System
Windows 10
Flash frequency
80
PSRAM enabled
yes
Upload speed
115200
Description
I often get truncated packets on uart when accessing either the flash with LittleFS wrapper or when making an http call from client to esp32. The way I test this is that I get an interrupt from
Serial1.onReceive()
with0x00
data in it when I save something to flash or when making an http request. http server uses the IDF server implementation as well as the arduino Serial wrapper.I expect the
onReceive
function to call my cb only when the set timeout is triggered. It isUART_SYMBOL_TIMEOUT 1
in this case. The problem is triggered every time when I write to flash. I can access menuconfig so I made sure that the UART functions and variables are inIRAM
and interrupts too.Sketch
Debug Message
Other Steps to Reproduce
Give repeated modbus packets to your esp, wait for them and access the flash meanwhile.
I have checked existing issues, online documentation and the Troubleshooting Guide