eModbus / eModbus

Modbus library for RTU, ASCII and TCP protocols. Primarily developed on and for ESP32 MCUs.
https://emodbus.github.io
MIT License
413 stars 137 forks source link

E2 - CRC check error #186

Closed OekoSolveMG closed 2 years ago

OekoSolveMG commented 2 years ago

Describe the bug: When trying to read holding registers I get a E2 - CRC check error relatively often. When enabling #define LOG_LEVEL LOG_LEVEL_DEBUG there are no E2 - CRC check error anymore.

Modbus Blocks:

Block: Min: Max: Size:
000 000 030 031
001 041 057 017

Expected behavior: There shouldn't be any E2 - CRC check error.

Ouput (LOG_LEVEL_DEBUG):

[D] Sent packet: @3FFB264C/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[D] 200229| ModbusClientRTU.cpp  [ 198] handleConnection: Request sent.
[D] Raw buffer received: @3FFB2A6C/67:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 D6 00 6D 00 01 00  00 01 C8 00 01 00 02 00  |y...m...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 01 55 5C                                          |.U\             |
[D] Received packet: @3FFB2CCC/65:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 D6 00 6D 00 01 00  00 01 C8 00 01 00 02 00  |y...m...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 01                                                |.               |
[D] 200356| ModbusClientRTU.cpp  [ 204] handleConnection: Data response (65 bytes) received.
[D] 200364| ModbusClientRTU.cpp  [ 225] handleConnection: Response generated.
[D] 200371| ModbusClientRTU.cpp  [ 193] handleConnection: Pulled request from queue
[D] Sent packet: @3FFB2604/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[D] 200391| ModbusClientRTU.cpp  [ 198] handleConnection: Request sent.
[D] Raw buffer received: @3FFB2A6C/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 53 C5  |....4..U....\.S.|
  | 0020: B8 00 16 06 DA 3F C9                              |.....?.         |
[D] Received packet: @3FFB2604/37:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 53 C5  |....4..U....\.S.|
  | 0020: B8 00 16 06 DA                                    |.....           |
[D] 200475| ModbusClientRTU.cpp  [ 204] handleConnection: Data response (37 bytes) received.
[D] 200483| ModbusClientRTU.cpp  [ 225] handleConnection: Response generated.
[D] 201190| ModbusClientRTU.cpp  [ 126] addRequestM: request for 01/03
[D] 201191| ModbusClientRTU.cpp  [ 175] addToQueue: RC=01
[D] 201191| ModbusClientRTU.cpp  [ 137] addRequestM: RC=00
[D] 201191| ModbusClientRTU.cpp  [ 193] handleConnection: Pulled request from queue
[D] 201196| ModbusClientRTU.cpp  [ 126] addRequestM: request for 01/03
[D] 201209| ModbusClientRTU.cpp  [ 175] addToQueue: RC=01
[D] 201214| ModbusClientRTU.cpp  [ 137] addRequestM: RC=00

Handle RTS method:

void handle_rts(bool toggle_level) {
  if (!w5500_initalized) {
    digitalWrite(RTS_PIN, toggle_level);
  }
}

Additional info:

I already tried to add the crc to my messages beforehand via. addCRC(), the problem seesm to lie in the response tough so that didn't solve any issues. Additionaly it seems that playing around with delays didn't really help as well, because E2 - CRC check error are less frequent but they still appear seemingly randomly.

I have also tried to play around with the Modbus interval timeout, but it also didn't really help in solving the issue. I also seem to get a response everytime so the request itself seems to be fine and it seems to be received on the other side as well.


Is there something I am missing with the CRC16 check or do I need to add something like a delay() etc? Because I really can't fathom what exactly the problem could be.

Miq1 commented 2 years ago

This definitely is a bus timing issue.

Adding the LOG_LEVEL_DEBUG will spend more time when printing out log lines, what may relieve the else critical timing.

The E2 error is indicating that an incomplete message is considered complete, but mostly will be followed by another error for the remainder of a message then treated as a new one. Did you see such a subsequent error?

addCRC() will not help at all, it simply will add another two bytes before the message is sent, the library then will add another CRC, including that you added with the call as payload bytes.

If you like, you could change the line

HEXDUMP_D("Raw buffer received", buffer, bufferPtr);

(around line 310 in RTUutils.cpp) into

HEXDUMP_N("Raw buffer received", buffer, bufferPtr);

Note the "N"!

Then the received buffer will be logged in any case before processing, even if the LOG_Level is at LOG_LEVEL_ERROR or above.

May be we will see more clearly what is happening.

OekoSolveMG commented 2 years ago

It seems that sometimes multiple responses are combined into one which causes the CRC16 check to fail. More specifically @3FFB2898/195: seems to be exactly where that happened.

It seems to combine the response @3FFB2898/67: with it's CRC? perhaps 4C 9F and them some leading data which I don't know where it comes from 00 00 00 08 00 50 0F 71 0C A8 00 01 00 01 00 64 00 01 A8 5E until then the @3FFB2898/39 response and again the @3FFB2898/67: is added.

So it seems to combine @3FFB2898/67: twice with @3FFB2898/39 wrapping it and some data in between I don't understand.

Log output:

[N] Raw buffer received: @3FFB2898/195:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CF 00 64 00 01 00  3E 00 60 00 00 00 00 00  |y...d...>.`.....|
  | 0020: 00 00 00 00 00 02 79 00  CF 00 64 00 01 00 00 01  |......y...d.....|
  | 0030: AD 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 0040: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 01 A8  5E 01 03 22 00 00 00 00  |....d...^.."....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 54  AC 10 00 16 06 DA 21 E8  |U....\.T......!.|
  | 0080: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0090: 79 00 CF 00 64 00 01 00  00 01 AE 00 01 00 00 00  |y...d...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 00C0: 01 B3 EA                                          |...             |
[MB] Read error: E2 - CRC check error
[N] Raw buffer received: @3FFB2898/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 AC  |....4..U....\.T.|
  | 0020: 11 00 16 06 DA 1C 28                              |......(         |
[N] Raw buffer received: @3FFB2898/67:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CF 00 5F FF FE 00  00 01 AD 00 01 00 00 00  |y..._...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 01 A4 41                                          |..A             |
[N] Raw buffer received: @3FFB2898/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 AC  |....4..U....\.T.|
  | 0020: 25 00 16 06 DA AD EC                              |%......         |
[N] Raw buffer received: @3FFB2898/67:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CF 00 5F FF FE 00  00 01 AD 00 01 00 00 00  |y..._...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 01 A4 41                                          |..A             |
[N] Raw buffer received: @3FFB2898/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 AC  |....4..U....\.T.|
  | 0020: 26 00 16 06 DA E9 EC                              |&......         |
[N] Raw buffer received: @3FFB2898/67:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CF 00 5F FF FE 00  00 01 AD 00 01 00 00 00  |y..._...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 01 A4 41                                          |..A             |

Error log dismantled:

[N] Raw buffer received: @3FFB2898/195:

Data from @3FFB2898/67 with perhaps crc:

  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CF 00 64 00 01 00  3E 00 60 00 00 00 00 00  |y...d...>.`.....|
  | 0020: 00 00 00 00 00 02 79 00  CF 00 64 00 01 00 00 01  |......y...d.....|
  | 0030: AD 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 0040: 28 02 79 4C 9F

Partial data from @3FFB2898/67 with CRC (See the block starting from line 00B0):

                                      00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 01 A8  5E

Data from @3FFB2898/39 with CRC:

                                                           01 03 22 00 00 00 00  |....d...^.."....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 54  AC 10 00 16 06 DA 21 E8  |U....\.T......!.|

Data from @3FFB2898/67 with CRC again:

  | 0080: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0090: 79 00 CF 00 64 00 01 00  00 01 AE 00 01 00 00 00  |y...d...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 00C0: 01 B3 EA

Additional info

Sometimes the log messages that produces is shorter as well.

[N] Raw buffer received: @3FFB2898/167:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 01 00 03  22 00 00 00 00 00 00 00  |....4...".......|
  | 0020: 00 00 00 00 00 00 00 00  00 EB 34 00 00 55 14 00  |..........4..U..|
  | 0030: 00 96 5C 00 54 B2 7C 00  16 06 DA B2 5F 01 03 3E  |..\.T.|....._..>|
  | 0040: 00 60 00 00 00 00 00 00  00 00 00 00 02 79 00 CE  |.`...........y..|
  | 0050: 00 66 00 02 00 00 01 A9  00 01 00 00 00 E2 00 00  |.f..............|
  | 0060: 02 B9 00 00 0B D6 27 28  02 79 4C 9F 00 00 00 08  |......'(.yL.....|
  | 0070: 00 50 0F 71 0C A8 00 01  00 01 00 64 00 01 87 A1  |.P.q.......d....|
  | 0080: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0090: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 B2  |....4..U....\.T.|
  | 00A0: 7D 00 16 06 DA 8F 9F                              |}......         |
[MB] Read error: E2 - CRC check error

It seems to start with @3FFB2898/39 without CRC instead and than append @3FFB2898/39 again but partially starting at 00 00 EB 34 again without CRC and then @3FFB2898/67 again

I even sometimes get this collosus.

[N] Raw buffer received: @3FFB2B7C/323:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 CE 00 68 00 03 01  3E 00 60 00 00 00 00 00  |y...h...>.`.....|
  | 0020: 00 00 00 00 00 02 79 00  CE 00 68 00 01 00 00 01  |......y...h.....|
  | 0030: A9 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 0040: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 01 2A  C4 01 03 22 00 00 00 00  |....d..*..."....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 54  B3 37 00 16 06 DA 17 80  |U....\.T.7......|
  | 0080: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0090: 79 00 CE 00 68 00 01 00  00 01 A9 00 01 00 00 00  |y...h...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 01 9F  |.........'(.yL..|
  | 00B0: A9 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 00C0: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 00D0: 01 00 01 00 64 00 01 2A  C4 01 03 22 00 00 00 00  |....d..*..."....|
  | 00E0: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 00F0: 55 14 00 00 96 5C 00 54  B3 37 00 16 06 DA 17 80  |U....\.T.7......|
  | 0100: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0110: 79 00 CE 00 68 00 01 00  00 01 A9 00 01 00 00 00  |y...h...........|
  | 0120: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0130: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0140: 01 2A C4                                          |.*.             |
[MB] Read error: E2 - CRC check error

But the first one seems to be the one that appears most often.

Miq1 commented 2 years ago

Hmmm. Interesting indeed. The send-a-request-and-then-listen-for-response logic is strictly sequential in the code, no further request shall be sent before the previous has been answered or at least has timed out. So the mixed responses must live somewhere in between. Can you identify data in the responses that may stem from previous requests? One idea I had is serial buffers not yet being ready again. Wich arduino-esp32 core are you using - 2.0.1 or a pre-2.0 version? The serial handling was rewritten completely in the newer core.

Miq1 commented 2 years ago

As a side note: do you happen to speak German? Your nick starting with "Öko" suggests it 😉

OekoSolveMG commented 2 years ago

I do indeed speak german and the esp32 core version I am currently using can be seen below in the plattformio framwork output:

Package framework-arduinoespressif32
------------------------------------
Type: framework
Requirements: ~3.10006.0
Installed: Yes
Version: 3.10006.210326
Original version: 1.0.6
Description: Arduino Wiring-based Framework for Espressif ESP32 microcontrollers

Can you identify data in the responses that may stem from previous requests? One idea I had is serial buffers not yet being ready again.

There seem only to be other respones mixed in atelast from what I could trace back in the logoutput and from dismantling the big block which contained a lot of data, but I did not see data that could represent a request.

Would it be possible to print only thoose out as well with replacing a HEXDUMP_Dto a HEXDUMP_N somewhere? Because then I could check just to make sure, but I am pretty sure that the big block only seems to be responses mangled together.

Miq1 commented 2 years ago

Na, dann könnten wir uns genausogut auf Deutsch unterhalten... 😁

Of course there is a line that will dump out the sent requests. Around line 216, in RTUutils.cpp again, you will find

HEXDUMP_D("Sent packet", data, len);

Here, replacing the "D" with "N" will do the same as in the receiving path.

OekoSolveMG commented 2 years ago

Also die Ausgabe der Anfragen sehen eigentlich auch gut aus. Daher wäre noch die Frage ob die Version wirklich ein Problem machen könnte, also halt dass Arduino espressif32 noch auf 1.0.6 ist anstatt auf 2.0.2.

Ich werde mal ein Testprojekt nehmen und dies dort mit der neusten Arduino Version testen und schreibe dann ob dort die sporadischen CRC Fehler immer noch auftauchen oder verschwunden sind.

Log output requests:

[N] Sent packet: @3FFB265C/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Sent packet: @3FFB25CC/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Sent packet: @3FFB265C/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Sent packet: @3FFB25CC/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Sent packet: @3FFB5730/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[MB] Read error: E2 - CRC check error
Miq1 commented 2 years ago

Umgekehrt: die neue Version könnte Probleme machen - mit der alten sollte es laufen.

Kannst Du mal beide HEXDUMP_N drin lassen, so dass man die zeitliche Abfolge Request/Response sehen kann?

OekoSolveMG commented 2 years ago

Die neueste Version der Library macht zudem auch keinen Unterschied ich habe trotzdem noch die gleichen CRC Fehler.

Log output request/receive combined:

[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB26A0/67:
  | 0000: 01 03 3E 00 71 02 7B 00  04 00 ED 00 00 00 00 02  |..>.q.{.........|
  | 0010: 79 00 E3 00 67 00 00 00  00 02 01 00 01 00 01 01  |y...g...........|
  | 0020: 86 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 03 2F 8F                                          |./.             |
[N] Sent packet: @3FFB2610/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB26A0/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F2  |....4..U....\.T.|
  | 0020: 4C 00 16 10 0F 33 64                              |L....3d         |
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB26A0/195:
  | 0000: 01 03 3E 00 71 02 7A 00  04 01 3C 00 00 00 00 02  |..>.q.z...<.....|
  | 0010: 79 00 E4 00 67 00 00 00  00 01 FF 00 01 00 01 01  |y...g...........|
  | 0020: 84 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 B9 00 00 00 0B D6 27  |....P.q........'|
  | 0040: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 03 2F  8F 01 03 22 00 00 00 00  |....d../..."....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 54  F2 4C 00 16 10 0F 33 64  |U....\.T.L....3d|
  | 0080: 01 03 3E 00 71 02 7A 00  04 01 3C 00 00 00 00 02  |..>.q.z...<.....|
  | 0090: 79 00 E4 00 67 00 00 00  00 01 FF 00 01 00 01 01  |y...g...........|
  | 00A0: 84 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 00C0: 03 2B 0E                                          |.+.             |
[MB] Read error: E2 - CRC check error
[N] Sent packet: @3FFB2610/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB26A0/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F2  |....4..U....\.T.|
  | 0020: 4D 00 16 10 10 4F 6C                              |M....Ol         |
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB26A0/67:
  | 0000: 01 03 3E 00 71 02 7B 00  04 01 3C 00 00 00 00 02  |..>.q.{...<.....|
  | 0010: 79 00 E3 00 67 00 00 00  00 02 01 00 01 00 01 01  |y...g...........|
  | 0020: 84 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 03 F7 CE                                          |...             |
[N] Sent packet: @3FFB2610/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB26A0/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F2  |....4..U....\.T.|
  | 0020: 4E 00 16 10 11 CA AC                              |N......         |
[N] Sent packet: @3FFB25F8/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB28BC/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F3  |....4..U....\.T.|
  | 0020: 18 00 16 10 DB 03 26                              |......&         |
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB28BC/67:
  | 0000: 01 03 3E 00 71 02 7A 00  04 00 ED 00 00 00 00 02  |..>.q.z.........|
  | 0010: 79 00 E4 00 66 FF FE 00  00 02 06 00 01 00 01 01  |y...f...........|
  | 0020: 82 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 03 B3 AE                                          |...             |
[N] Sent packet: @3FFB25F8/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB28BC/167:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB A0 9F 01 03  22 00 00 00 00 00 00 00  |........".......|
  | 0020: 00 00 00 00 00 00 00 00  00 EB 34 00 00 55 14 00  |..........4..U..|
  | 0030: 00 96 5C 00 54 F3 18 00  16 10 DB 03 26 01 03 3E  |..\.T.......&..>|
  | 0040: 00 71 02 7A 00 04 00 ED  00 00 00 00 02 79 00 E4  |.q.z.........y..|
  | 0050: 00 66 FF FE 00 00 02 06  00 01 00 01 01 82 00 00  |.f..............|
  | 0060: 02 B9 00 00 0B D6 27 28  02 79 4C 9F 00 00 00 08  |......'(.yL.....|
  | 0070: 00 50 0F 71 0C A8 00 01  00 01 00 64 00 03 B3 AE  |.P.q.......d....|
  | 0080: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0090: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F3  |....4..U....\.T.|
  | 00A0: 19 00 16 10 DC 7F 24                              |.....␡$         |
[MB] Read error: E2 - CRC check error
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB28BC/67:
  | 0000: 01 03 3E 00 71 02 7B 00  02 01 3D 00 00 00 00 02  |..>.q.{...=.....|
  | 0010: 79 00 E6 00 68 FF FF 00  00 02 06 00 01 00 01 01  |y...h...........|
  | 0020: 84 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 03 50 A0                                          |.P.             |
[N] Sent packet: @3FFB25F8/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB28BC/39:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F3  |....4..U....\.T.|
  | 0020: 93 00 16 11 56 66 CC                              |....Vf.         |
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB2AC0/67:
  | 0000: 01 03 3E 00 71 02 7A 00  04 01 3C 00 00 00 00 02  |..>.q.z...<.....|
  | 0010: 79 00 E5 00 68 00 01 00  00 02 06 00 01 00 01 01  |y...h...........|
  | 0020: 84 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0040: 03 49 D4                                          |.I.             |
[N] Sent packet: @3FFB2610/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB26A0/167:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 50 A0 01 03  22 00 00 00 00 00 00 00  |....P...".......|
  | 0020: 00 00 00 00 00 00 00 00  00 EB 34 00 00 55 14 00  |..........4..U..|
  | 0030: 00 96 5C 00 54 F3 93 00  16 11 56 66 CC 01 03 3E  |..\.T.....Vf...>|
  | 0040: 00 71 02 7A 00 04 01 3C  00 00 00 00 02 79 00 E5  |.q.z...<.....y..|
  | 0050: 00 68 00 01 00 00 02 06  00 01 00 01 01 84 00 00  |.h..............|
  | 0060: 02 B9 00 00 0B D6 27 28  02 79 4C 9F 00 00 00 08  |......'(.yL.....|
  | 0070: 00 50 0F 71 0C A8 00 01  00 01 00 64 00 03 49 D4  |.P.q.......d..I.|
  | 0080: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0090: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 54 F3  |....4..U....\.T.|
  | 00A0: 94 00 16 11 57 12 CC                              |....W..         |
[MB] Read error: E2 - CRC check error
Miq1 commented 2 years ago

(Ich bleibe beim Englischen, andere könnten auch interessiert sein)

WTF? This line:

[MB] Read error: E2 - CRC check error

should not be possible! The MB is no valid logging level but an artifact, the LOG_E macro is calling internally with an "E" here only. If you did not code your own error response putting it in like can be seen, this is pointing to some memory violation happening.

Would you mind posting your sketch here?

OekoSolveMG commented 2 years ago

Ah no that's what I am printing when a Modbus error occures so no worries there.

void ModbusMaster::log_modbus_error(const Modbus::Error& error) const {
    ModbusError modbus_error(error);
    Serial.printf(PSTR("[MB] Read error: %02X - %s\n"), (int)modbus_error, (const char*)modbus_error);
}
Miq1 commented 2 years ago

Phew. Better, although we did not catch the bug yet.

I am currently checking the erroneous responses for a common pattern that may hint us to the root cause.

OekoSolveMG commented 2 years ago

Thanks the help is highly appreciated, do you think playing around with the end of message timeout might help?

Miq1 commented 2 years ago

Nope, there is something else going on.

Please see the reformatting and commenting in your previous log:

[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB26A0/67:
  | 0000: 01 03 3E
          00 71 02 7B 00 04 00 ED 00 00 00 00 02 79 00 E3 
          00 67 00 00 00 00 02 01 00 01 00 01 01 86 00 00 
          02 B9 00 00 0B D6 27 28 02 79 4C 9F 00 00 00 08 
          00 50 0F 71 0C A8 00 01 00 01 00 64 00 03 
          CRC 2F 8F                                          |./.             |
[N] Sent packet: @3FFB2610/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB26A0/39:
  | 0000: 01 03 22 
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
          EB 34 00 00 55 14 00 00 96 5C 00 54 F2 4C 00 16 
          10 0F 
          CRC 33 64                              |L....3d         |
[N] Sent packet: @3FFB26A0/6:
  | 0000: 01 03 00 00 00 1F                                 |......          |
[N] Raw buffer received: @3FFB26A0/195:
  | 0000: 01 03 3E 
          00 71 02 7A 00 04 01 3C 00 00 00 00 02 79 00 E4 
          00 67 00 00 00 00 01 FF 00 01 00 01 01 84 00 00 
          02 B9 00 00 0B D6 27 28 02 79 4C 9F 00 00 00 08 
          00 50 0F 71 0C A8 
 --> regular pattern broken before response ends. It looks like the response is received
     again now with an offset:
             B9 00 00 
 --> an additional 00 byte?
          00 
 --> and the response commences:
                      0B D6 27 28 02 79 4C 9F 00 00 00 08 
          00 50 0F 71 0C A8 00 01 00 01 00 64 00 03 
          CRC 2F 8F 
 --> another response, for the second request again:
          01 03 22
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
          EB 34 00 00 55 14 00 00 96 5C 00 54 F2 4C 00 16 
          10 0F 
          CRC 33 64
 --> and a third response:
  | 0080: 01 03 3E 
          00 71 02 7A 00 04 01 3C 00 00 00 00 02 79 00 E4 
          00 67 00 00 00 00 01 FF 00 01 00 01 01 84 00 00 
          02 B9 00 00 0B D6 27 28 02 79 4C 9F 00 00 00 08 
          00 50 0F 71 0C A8 00 01 00 01 00 64 00 03 
          CRC 2B 0E                                          |.+.             

As you can see, the second and third response baked together with the failing first are exact copies of those you had received before. That may come from the data not having changed since, but why should the server send some unsolicited response? Correction: the third has a different CRC!

Again it looks like a buffer has been overrun. The receive FIFO is normally emptied before starting to receive.

OekoSolveMG commented 2 years ago

What type is the server you are requesting?

I don't exactly know what you are refering too, with type of server. But the device I am using is simply a modbus end device with a baud rate of 9600 and the default configuration with no parity and 1 stop bit.

Would the error pattern change if you requested it less frequently?

The current modbus request rate is every second, but I've decreases it to 5 seconds to test it and it still caused the crc check fails.

Does the pattern show up with other types of servers as well?

The error seems not be independent on the device as the LogicAnalyzer seems to see the correct data and not fail the crc check.

Do you have further tasks running that may interfere with the Serial traffic?

In the test project where the error occurs, there is of course the main loop task as well as another task that just gets called in a 1 second intervall (Just a call that requests 2 blocks to be read from the Modbus). But I don't write anything to the Serial2 for Modbus communication, besides what this library writes to it.

A second ESP32 acting as a server could give some insights, because you could log the received requests there as well.

image

When looking at the requests and responses with a LogicAnalyzer I can see that the response that is mangled and creates a crc, is valid in the LogicAnalyzer and is read as it should as 1 request and not as 3 requests combined.

Additional info: I'm also currently running the modbus task with no specific core, but that changing that didn't seem to have an influence either.

modbus.begin(tskNO_AFFINITY);

Would it be helpful if I send the complete test project or how I setup the eModbus library / pins?

Miq1 commented 2 years ago

Excellent. So we know that the bus is working okay, the problem must lie inside the sketch or library.

I don't exactly know what you are refering too, with type of server. But the device I am using is simply a modbus end device with a baud rate of 9600 and the default configuration with no parity and 1 stop bit.

I was asking for brand etc. of the server device. But this seems to be regardless now.

So let me recapitulate the facts we know:

All correct? If so, some more questions to narrow down on the error:

Miq1 commented 2 years ago

I closely examined the receive() code again. There is only one way to fill the buffer with the data we see: by serial.read(). There must be something interfering with the core's serial buffer or even the FIFO queue to bring in the additional data.

OekoSolveMG commented 2 years ago

I'm currently requesting both blocks that I want to read more or less immediately one after another. Could this be causing an issue?

Because it seems like I atleast until now don't get a CRC if I add a delay(5); between the 2 addRequests.

ModbusBlock is baiscally just a ModbusMessage with an additional array of the registers we actually wanted to read and their sizes.

for (ModbusBlock& block : m_modbus_blocks) {
    ModbusMessage& request_message = block.get_message();
    const uint32_t& token = get_new_token();
    Serial.printf(PSTR("[MB] Request started at %lu\n"), millis());
    const Modbus::Error error = m_modubs_client.addRequest(request_message, token);
    delay(5); // <--- Added seesm to fix the error so no more crc check fails are generated
    if (error != Modbus::Error::SUCCESS) {
      log_modbus_error(error);
      continue;
    }
    enqueue_request(block, token);
}
Miq1 commented 2 years ago

I'm currently requesting both blocks that I want to read more or less immediately one after another. Could this be causing an issue?

Because it seems like I atleast until now don't get a CRC if I add a delay(5); between the 2 addRequests.

No, should not matter at all. The requests are put on a queue and the RTU worker task will not pull another request before the callback for the response resulting from the last has finished.

What does the enqueue_request(block, token); do? The addRequest() call will put your request on the queue already...

BTW: const Modbus::Error error = m_modubs_client.addRequest(request_message, token);: "modubs"???

OekoSolveMG commented 2 years ago

That's just puts the block we just requested into my own map so I can later get back which block was originally requested from the token. So I can get back which registers were registered and what exactly their size is.

void ModbusMaster::enqueue_request(ModbusBlock& block, const uint32_t& token) {
    if (is_registered(token)) {
        Serial.println(PSTR("[MB] Fatal error: Attempted to use already registered token."));
        return;
    }
    m_message_id++;
    m_modbus_messages.emplace(token, &block);
}

Nvm the error appeared with the delay between requests as well now, perhaps I just got lucky when I let it run last time.

OekoSolveMG commented 2 years ago

I've recorded a while of sending and receiving responses and the CRC check error seems to be relatively irregular. I previously tried with reading more smaller blocks as well but the CRC still failed more or less the same amount of time, but I'll try to measure it tomorrow to see if there might be a difference.

[MB] Request started at 42757
[MB] Request started at 42757
[MB] Read error: E2 - CRC check error

// Difference 92 seconds

[MB] Request started at 134757
[MB] Request started at 134757
[MB] Read error: E2 - CRC check error

// Difference 37 seconds

[MB] Request started at 171757
[MB] Request started at 171757
[MB] Read error: E2 - CRC check error

// Difference 81 seconds

[MB] Request started at 252757
[MB] Request started at 252757
[MB] Read error: E2 - CRC check error

// Difference 7 seconds

[MB] Request started at 259757
[MB] Request started at 259757
[MB] Read error: E2 - CRC check error

// Difference 41 seconds

[MB] Request started at 300757
[MB] Request started at 300757
[MB] Read error: E2 - CRC check error

// Difference 65 seconds

[MB] Request started at 365757
[MB] Request started at 365757
[MB] Read error: E2 - CRC check error

// Difference 56 seconds

[MB] Request started at 421757
[MB] Request started at 421757
[MB] Read error: E2 - CRC check error

// Difference 115 seconds

[MB] Request started at 536757
[MB] Request started at 536757
[MB] Read error: E2 - CRC check error
Miq1 commented 2 years ago

Rats. 😠

This makes it even more difficult. If it was a software fault, one would expect it to happen in regular intervals. With your measurements it is looking more like an intermittent hardware issue.

OekoSolveMG commented 2 years ago
  • Is your setup on a PCB in a housing already or still on breadboard?

Sadly it is already in a housing but I could try to get a PCB that isn't in a housing yet to try, one on a breadboard probably won't be possible tough.

Additonaly the weird thing is that I atleast never seemed to have thoose issues with an old Modbus Library I was using. Programming wise it had its fare shair of flaws (no settable timeout, only blocking) etc. but it seemed to work https://github.com/4-20ma/ModbusMaster.

That's additionally why I am confused that it previously seemed to work but not anymore. (Perhaps the old library also just ignored thoose issues and silently failed, not sure about that tough).

  • Something around it that may induce voltage peaks, electromagnetic fields or such?

Definetly my use case is communicating with a device that produces high voltage meaning there are definetly electromagnetic fields, but that never caused issues with the communication previously. That's also the main reason I am using ModbusRTU in the first place, so that I can remove spikes etc. with the inverted B and A signal.

Additionaly I am currently not producing any high voltage from the device that is connected for testing purposes and the CRC check errors still appeared.

Besides that there isn't anything that should produce enourmous amounts of electromagnetic fields and the voltage is at a fixed 24V.

  • Where did you connect the LA - directly on the TX/RX pins on the MCU board or somewhere else?

I receive the signal that filter voltage spikes with two ferit beads one on each A and B respectively. Which then go to a UART-RS485 converter which then goes onto the RX or TX pin depeding on the enable pin.

  • Are you doing something in your sketch based on the data you received? Calculate addresses or such?

No it's basically just the data from the device like temperature it records and how long it has been running etc. I then take that info and multiply it with a predetermined value this is to make sure that I don't need to communicate giant values but can shorten them like V to kV etc.

But I am not really calculating anything else with thoose values and on the example sketch I am currently using I am not doing anything with thoose values just checking if they are received successfully.

OekoSolveMG commented 2 years ago

Would it perhaps help If i tried to add some extra flush calls to try to cause the crc not to fail.

OekoSolveMG commented 2 years ago

It seems like enabling LOG_LEVEL_DEBUG causes errors but this time it seems to extract the last message into it's own message instead of causing an error for the giant block like previously.

[D] Raw buffer received: @3FFB27C8/195:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 E9 00 67 00 00 00  00 02 17 00 01 00 00 00  |y...g...........|
  | 0020: E2 00 00 02 B9 00 00 E9  0B 00 67 00 00 00 00 02  |..........g.....|
  | 0030: 17 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 0040: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 01 E8  BC 01 03 22 00 00 00 00  |....d......"....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 55  F5 99 00 17 08 E4 34 4E  |U....\.U......4N|
  | 0080: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0090: 79 00 E9 00 67 00 00 00  00 02 17 00 01 00 00 00  |y...g...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 00C0: 01 E8 BC                                          |...             |
[D] Received packet: @3FFBDB4C/1:
  | 0000: E2                                                |.               |
[D] 214955| ModbusClientRTU.cpp  [ 204] handleConnection: Error response (1 bytes) received.
[D] 214963| ModbusClientRTU.cpp  [ 225] handleConnection: Response generated.
[MB] Read error: E2 - CRC check error

[D] Raw buffer received: @3FFB27C8/323:
  | 0000: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0010: 79 00 E9 00 66 00 00 00  00 02 15 00 01 00 00 00  |y...f...........|
  | 0020: E2 00 00 02 B9 79 00 00  E9 00 66 00 00 00 00 02  |.....y....f.....|
  | 0030: 15 00 01 00 00 00 E2 00  00 02 B9 00 00 0B D6 27  |...............'|
  | 0040: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 0050: 01 00 01 00 64 00 01 6A  F4 01 03 22 00 00 00 00  |....d..j..."....|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 55  F5 C6 00 17 08 E4 A0 43  |U....\.U.......C|
  | 0080: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0090: 79 00 E9 00 66 00 00 00  00 02 15 00 01 00 00 00  |y...f...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 D6 27  |....P.q........'|
  | 00C0: 28 02 79 4C 9F 00 00 00  08 00 50 0F 71 0C A8 00  |(.yL......P.q...|
  | 00D0: 01 00 01 00 64 00 01 6A  F4 01 03 22 00 00 00 00  |....d..j..."....|
  | 00E0: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 00F0: 55 14 00 00 96 5C 00 55  F5 C6 00 17 08 E4 A0 43  |U....\.U.......C|
  | 0100: 01 03 3E 00 60 00 00 00  00 00 00 00 00 00 00 02  |..>.`...........|
  | 0110: 79 00 E9 00 66 00 00 00  00 02 15 00 01 00 00 00  |y...f...........|
  | 0120: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0130: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 0140: 01 6A F4                                          |.j.             |
[D] Received packet: @3FFBDB4C/1:
  | 0000: E2                                                |.               |
[D] 260010| ModbusClientRTU.cpp  [ 204] handleConnection: Error response (1 bytes) received.
[D] 260018| ModbusClientRTU.cpp  [ 225] handleConnection: Response generated.
[MB] Read error: E2 - CRC check error
OekoSolveMG commented 2 years ago

Would it be helpful if I send the complete test project or how I setup the eModbus library / pins? Perhaps how I initalized the library might be faulty as well?

Miq1 commented 2 years ago

Good morning and welcome back at work! 🤣

To start from the end: yes, seeing the code might help. If it is proprietary, you can mail it to miq1 at gmx dot de, I promise not to disclose it to others. I think it is okay to post snippets to talk about details?

The errors do show up with the debug log level as well - that in fact is good news! and if that is enforcing the giant error packets, we have a hint again. You may try with LOG_LEVEL_VERBOSE to see if that will again increase the sizes. The logging does nothing to the normal handling of data and errors, but just is sprinkeling in some more output.

If you had your LA at the A/B lines on the RS485 side, that may be a different picture than is seen on the UART side. If there is any chance to check the UART lines directly, I would strongly advise doing so.

I take from your post you are using a RS485 adapter with DE/RE pins? Then our experiences here have shown that these are highly sensitive on bus termination and biasing issues. It may be worthwhile checking that as well.

OekoSolveMG commented 2 years ago

To start from the end: yes, seeing the code might help. If it is proprietary, you can mail it to miq1 at gmx dot de, I promise not to disclose it to others. I think it is okay to post snippets to talk about details?

I'll make a public git repository and include the test project which has the errors as well and share the link as soon as I've done that. The main problem is that I am using a M5 Stack for the communication which already handles some of the initalization. So I don't know how easy it would be to directly use that example project with only an ESP32.

The errors do show up with the debug log level as well - that in fact is good news! and if that is enforcing the giant error packets, we have a hint again. You may try with LOG_LEVEL_VERBOSE to see if that will again increase the sizes. The logging does nothing to the normal handling of data and errors, but just is sprinkeling in some more output.

I don't think the blocks become bigger, the only difference is that we get additonal log output for generated responses that only contain a few byte and they seem to become more with a higher LOG_LEVEL.

CRC Check error output with LOG_LEVEL_VERBOSE

[N] Raw buffer received: @3FFB27B0/249:
  | 0000: 01 03 74 00 60 00 00 00  00 00 00 00 00 00 00 02  |..t.`...........|
  | 0010: 79 00 DF 00 69 FF FF 00  00 01 ED 00 01 00 00 00  |y...i...........|
  | 0020: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 0030: 00 00 08 00 50 0F 71 0C  A8 00 01 00 0F 71 0C A8  |....P.q......q..|
  | 0040: 00 01 00 01 00 64 00 01  00 00 00 00 00 00 00 00  |.....d..........|
  | 0050: 00 00 00 00 00 00 00 0A  00 00 00 00 00 00 00 00  |................|
  | 0060: 00 00 00 00 00 00 00 00  00 00 00 00 EB 34 00 00  |.............4..|
  | 0070: 55 14 00 00 96 5C 00 56  00 0D 00 17 08 E4 B4 38  |U....\.V.......8|
  | 0080: 01 03 74 00 60 00 00 00  00 00 00 00 00 00 00 02  |..t.`...........|
  | 0090: 79 00 DF 00 69 FF FF 00  00 01 ED 00 01 00 00 00  |y...i...........|
  | 00A0: E2 00 00 02 B9 00 00 0B  D6 27 28 02 79 4C 9F 00  |.........'(.yL..|
  | 00B0: 00 00 08 00 50 0F 71 0C  A8 00 01 00 01 00 64 00  |....P.q.......d.|
  | 00C0: 01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  | 00D0: 0A 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  | 00E0: 00 00 00 00 00 EB 34 00  00 55 14 00 00 96 5C 00  |......4..U....\.|
  | 00F0: 56 00 0E 00 17 08 E4 86  2A                       |V.......*       |
[D] Received packet: @3FFBDB4C/1:
  | 0000: E2                                                |.               |
[D] 33132| ModbusClientRTU.cpp  [ 204] handleConnection: Error response (1 bytes) received.
[V] Data: @3FFBDB4C/1:
  | 0000: E2                                                |.               |
[D] 33149| ModbusClientRTU.cpp  [ 225] handleConnection: Response generated.
[V] Response packet: @3FFBDB4C/3:
  | 0000: 01 83 E2                                          |...             |
[MB] Read error: E2 - CRC check error

If you had your LA at the A/B lines on the RS485 side, that may be a different picture than is seen on the UART side. If there is any chance to check the UART lines directly, I would strongly advise doing so.

The LogicAnalyzer is directly connected on the UART side (logic level), therefore the signals which are handed to the ESP32 are correct (CRC bytes are valid for the response and request blocks compared to the ones calculated by the LogicAnalyzer).

I take from your post you are using a RS485 adapter with DE/RE pins? Then our experiences here have shown that these are highly sensitive on bus termination and biasing issues. It may be worthwhile checking that as well.

I've added some overlapping time with a delay after DE/RE to ensure we always have some spare time before an actual request, but the CRC check still occur.

OekoSolveMG commented 2 years ago

Here is the link for the test project and thanks a lot for all your help. https://github.com/OekoSolveMG/M5Stack_Sandbox

The ModbusMaster just wraps around this Modbus Library so I can just register addresses and automatically build blocks from them which are then requested. Instead of requesting each address with its own request.

Miq1 commented 2 years ago

Excellent, I will have a look at it today as soon as I am finished with some other tasks.

Miq1 commented 2 years ago

I shortly browsed your code - here are some things that sprang to my mind. Nothing obvious, I am afraid...

OekoSolveMG commented 2 years ago

Please note that you are running here in the context of the worker task, not your own main task. This may (must not) affect the behaviour of the worker. The stack space of the worker is also limited (4K IIRC); if too much local storage is claimed, the worker task may run into problems.

That's good to know. I'll try to remove the whole code and just call the callback directly to see if that makes the CRC errors go aways and then I'll try to readd features step by step perhaps that will help to get to the true core of the problem.

Plus it might be a good idea to add a millis call and print out the difference to see how long it needs. I'll probably do that as well, if removing the whole handling code actually solves the issue.

  • The level of indirection with the processing of a response is amazingly high. Who will be using the dedicated calls like handle_read_hold_reg() etc.?

Can you specify what you mean with that because as long as you use the FunctionCode::READ_HOLDING_REGISTER this method will be used in the end.

OekoSolveMG commented 2 years ago

I completly removed more or less every single call as seen below same for the handle_data method and the crc check error still seems to occur.

void ModbusMaster::handle_error(Modbus::Error error, uint32_t token) {
    m_received_callback(error, 0U, 0U);
    log_modbus_error(error);
    /*
    if (!is_registered(token)) {
        Serial.println(PSTR("[MB] Fatal error: Gotten token has not been registered."));
        return;
    }
    IModbusRequest* ptr_request = m_modbus_messages.at(token);
    if (ptr_request == nullptr) {
        Serial.println(PSTR("[MB] Fatal error: Registered IModbusRequest was NULL."));
        return;
    }
    evaluate_error(error, ptr_request);
    dequeue_request(token);
    */
}
Miq1 commented 2 years ago

Are those remaining calls threadsafe? Are they using a substantial number of nested calls themselves, with large and/or many parameters? I am still afraid this may bust the stack of the worker.

Get me right: there is no verdict to use the callbacks as you do, but I never imagined there would be lots of processing while the callbacks are running. All I normally do is to check and store the response data into global structures, set a flag and be gone.

Regarding the "level of indirection" remark: one will

The construction

uint16_t val = m_default_value;
response.get(index, val);

has some room for improvement as well. The default value will never be used, as the get() will return a value in any case. Second, instead of calculating the index yourself over and over, you could use

index = response.get(index, val);

instead, since the get() will return the updated index in respect of the retrieved data type.

If I may suggest another approach: take the block identifier as part of the token for the request. In handle_data(), for read requests extract the block ID from the token and simply get all the values and store those according to the block. Write requests will either end in handle_error() or will return the beginning of the request as a confirmation. No need to check that again, just note the write has been done.

Your beers, of course! 😉

OekoSolveMG commented 2 years ago

Are those remaining calls threadsafe? Are they using a substantial number of nested calls themselves, with large and/or many parameters? I am still afraid this may bust the stack of the worker.

I've removed even the callback call and now I only print to the console. That should definetly be threadsafe or atleast I hope so and the handle_data / handle_error is directly called from the eModbus library when a ModbusMessage response was received.

But as previously even with thoose changes I still get CRC check errors.

Handle data method:

void ModbusMaster::handle_data(ModbusMessage response, uint32_t token) {
    Serial.println("Success");
    /*
    if (!is_registered(token)) {
        Serial.println(PSTR("[MB] Fatal error: Gotten token has not been registered."));
        return;
    }
    IModbusRequest* ptr_request = m_modbus_messages.at(token);
    if (ptr_request == nullptr) {
        Serial.println(PSTR("[MB] Fatal error: Registered IModbusRequest was NULL."));
        return;
    }
    evaluate_function_code(ptr_request, response);
    dequeue_request(token);
    */
}
  • There you will pull the original request to compare the function codes - this has been done by the library before as well

I didn't know that was done as well so that's good to know thanks.

  • In that function you will call m_received_callback() with every single value/mb_reg combination. I suppose this finally is storing the data to the destination and will print out a message once it fails?

No the method actually printing the error isn't the callback, but the library itself

void ModbusMaster::handle_error(Modbus::Error error, uint32_t token) {
    if (!is_registered(token)) {
        Serial.println(PSTR("[MB] Fatal error: Gotten token has not been registered."));
        return;
    }
    IModbusRequest* ptr_request = m_modbus_messages.at(token);
    if (ptr_request == nullptr) {
        Serial.println(PSTR("[MB] Fatal error: Registered IModbusRequest was NULL."));
        return;
    }
    log_modbus_error(error); // <-------------- Basically just a call to Serial.printf with ModbusError.
    evaluate_error(error, ptr_request);
    dequeue_request(token);
}

But response.get_error() will always return the same error code, regardless how many values you are retrieving. And since you have a handle_error() handler, the responses arriving in handle_data()never will have an error...

True getting the error as well is a little bit overkill and not really needed. I can just replace it with Modbus::Error::SUCCESS

Has some room for improvement as well. The default value will never be used, as the get() will return a value in any case.

Once again the default_value is not really needed, but I prefer to be descriptive / put default parameters even if they get changed anyway, because theoretically there might be a future version where the get() method keeps the value the same if we exceeded the message length.

Second, instead of calculating the index yourself over and over, you could use index = response.get(index,val);

I knew this method existet and previously it worked like that the problem is tough I am not actually calling the callback for every single address we read but instead the addresses we registered. Because I request blocks.

For example if i request addresses: 1, 2, 3, 4, 5, 9, 10, 11, 12, 13, 15, 17, 19, 21

Then I build these blocks with a max address difference of 2 then I will get 2 blocks from 1-5 and from 9-21.

When I then arrive and call the callback I want it to be called for .... 13, 15, 17, 19, 21 and not 13, 14, 15, 16, .... etc.

That's why I have to calcualte the index because it doesn't loop through the data vector but instead it loops through the addresses we requested and calculates the point in the data vector they should be in.

If I may suggest another approach: take the block identifier as part of the token for the request. In handle_data(), for read requests extract the block ID from the token and simply get all the values and store those according to the block. Write requests will either end in handle_error() or will return the beginning of the request as a confirmation. No need to check that again, just note the write has been done.

I'm not sure if I completly understand so you mean just caching the results of handleData and then getting them instead of calling a callback for each address right?

Your beers, of course! 😉

Thanks a lot for the tips always appreciated :D.

Miq1 commented 2 years ago

The blocks are meant to reduce the number of requests necessary? I see. So in effect the worker task will also have to cover all the application callbacks,,, I still would request one large block - as long as it will fit in 250 bytes -, step through the response register-wise and skip those that were not requested. With your example that would be just one request 1-21 and a loop over 21 steps to sort out the desired. Only then I would set a mutex flag "New data arrived" somewhere and return and let the main task do whatever with that. But that is another path to Rome.

Back to topic: you could try increasing the worker's stack size in ModbusClientRTU.cpp. There is a line

xTaskCreatePinnedToCore((TaskFunction_t)&handleConnection, taskName, 4096, this, 6, &worker, coreID >= 0 ? coreID : NULL);

where you can change the 4096 to something appropriate - 16384 or such. Will be interesting to learn if that changes things.

OekoSolveMG commented 2 years ago

Yeah tried to increase the stack size to 16384, but the crc check error still occurs and the test project currently still does nothing besides print success when I could read the blocks or print the error if one occured.

Miq1 commented 2 years ago

Argh. I have no clue where that beast may be hidden.

Again:

Concluding all is pointing towards a memory problem, where in the end the serial buffer gets corrupted.

Since this is going on silently (as seen from the MCU), we seem to see a buffer overrun or address corruption within the stack or heap.

Can you print out the free heap in handle_data() and handle_error(), please? Unfortunately that is the only data we get at runtime, the stack is not to be measured AFAIK.

OekoSolveMG commented 2 years ago

Added to handle_data() and handle_error()

ESP_LOGI("MAIN", "free heap: %d", esp_get_free_heap_size());
ESP_LOGI("MAIN", "internal free heap: %d", esp_get_free_internal_heap_size());
ESP_LOGI("MAIN", "minimum free heap: %d", esp_get_minimum_free_heap_size());

I'll let it run wild for a while and see what the results are.

OekoSolveMG commented 2 years ago

I still get the error but it doesn't really seem like there isn't enough heap left. I'll try to play around with delays before and after toggling DE/RE perhaps that will help.

[I][ModbusMaster.cpp:113] handle_data(): free heap: 267560
[I][ModbusMaster.cpp:114] handle_data(): internal free heap: 267560
[I][ModbusMaster.cpp:115] handle_data(): minimum free heap: 259360
[N] Sent packet: @3FFB8C68/6:
  | 0000: 01 03 00 29 00 11                                 |...)..          |
[N] Raw buffer received: @3FFB8F60/167:
  | 0000: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0010: 00 00 00 EB 34 00 00 22  55 00 00 00 00 00 00 00  |....4.."U.......|
  | 0020: 00 00 00 00 00 00 00 00  00 EB 34 00 00 55 14 00  |..........4..U..|
  | 0030: 00 96 5C 00 57 43 09 00  17 39 E3 99 97 01 03 3E  |..\.WC...9.....>|
  | 0040: 00 60 00 00 00 00 00 00  00 00 00 00 02 79 00 D3  |.`...........y..|
  | 0050: 00 69 00 01 00 00 01 BD  00 01 00 00 00 E2 00 00  |.i..............|
  | 0060: 02 B9 00 00 0B D6 27 28  02 79 4C 9F 00 00 00 08  |......'(.yL.....|
  | 0070: 00 50 0F 71 0C A8 00 01  00 01 00 64 00 01 76 5C  |.P.q.......d..v\|
  | 0080: 01 03 22 00 00 00 00 00  00 00 00 00 00 00 00 00  |..".............|
  | 0090: 00 00 00 EB 34 00 00 55  14 00 00 96 5C 00 57 43  |....4..U....\.WC|
  | 00A0: 0A 00 17 39 E3 DD 97                              |...9...         |
[MB] Read error: E2 - CRC check error
[I][ModbusMaster.cpp:133] handle_error(): free heap: 267812
[I][ModbusMaster.cpp:134] handle_error(): internal free heap: 267812
[I][ModbusMaster.cpp:135] handle_error(): minimum free heap: 259360
Miq1 commented 2 years ago

Please read my lips: this is no communications error - else you could not have seen there is no large block on your LA connected to RX/TX. (Provided you did not catch just one packet... 😉 )

The bytes are regularly coming through the serial.read() in RTUutils.cpp's receive(). Therefore the data must get corrupted "under the hood", inside the core UART handling.

This is not a common case, else we would have heard similar before. So either the UART handling in the core is faulty in rare constellations, or something in your code is running amuck.

We need to find out either case.

Can you again try to further strip down your code to the bare minimum? That is, without any Arial font, special M5stack calls, block storage etc. Just a plain sketch requesting from the server with the basic eModbus calls.

OekoSolveMG commented 2 years ago

Can you again try to further strip down your code to the bare minimum? That is, without any Arial font, special M5stack calls, block storage etc. Just a plain sketch requesting from the server with the basic eModbus calls.

Updated the public repository and removed nearly everything besides initalizing my device and printing out some text in onData and in onError and requesting 2 blocks each second. I still get the same CRC error checks tough I'll try to use a ESP-WROOM module directy and remove the M5Stack to see if that fixes the issue will come back to you once I tested that.

Miq1 commented 2 years ago

Updated the public repository and removed nearly everything

Yeah, I was browsing the repository when out of a sudden the code was different 😄

I am looking forward to the results with a plain ESP32. If that is working okay, theer may be something wrong with the Serial2 handling on the M5stack. In case: can you deploy another Serial for the Modbus comms?

OekoSolveMG commented 2 years ago

In case: can you deploy another Serial for the Modbus comms?

As far as I know not I only have two HardwareSerial and if I use one for logging to the console there is only one left that I can use for Modbus communication. I could theoratically use Serial1 for modbus and Serial2 for logging to see if that changes anything.

Miq1 commented 2 years ago

There is Serial, Serial1 and Serial2 normally on WROOM and WROVER boards. Serial supposedly is used for the standard serial monitor, so at least Serial1 should be available?

OekoSolveMG commented 2 years ago

Yeah switched to Serial1 for Modbus now, but still created the CRC check error. So it isn't that either.

Miq1 commented 2 years ago

Yeah switched to Serial1 for Modbus now, but still created the CRC check error. So it isn't that either.

That sorts out memory corruption as well, as the internal buffers of Serial1 and Serial2 are at different locations.

Not to nag you, but can you do a longer LA session once more? I can not believe that on the Serial pins is other data than the FIFO internally will see.

OekoSolveMG commented 2 years ago

Will try that the LA is still connected anyway, but I'm letting it run while reading only 5 registers at a time currently as well. Because it doesn't fail as fast as previously and the question is if it still fails. BUt after it is clear that it will fail with only reading 1 register regardless I'll try to increase the size.

Miq1 commented 2 years ago

My hope was that you will catch a defective packet with the logic analyzer by chance. If that happens we can concentrate on the bus comms again instead of dissecting every single bit on the software side.

OekoSolveMG commented 2 years ago

Okay then I'll ensure the crc check error apperas on the software side and see if it really doesn't appear on the hardware side.