fredlcore / BSB-LAN

LAN/WiFi interface for Boiler-System-Bus (BSB) and Local Process Bus (LPB) and Punkt-zu-Punkt Schnittstelle (PPS) with a Siemens® controller used by Elco®, Brötje® and similar heating systems
232 stars 84 forks source link

Issues with ESP32 implementation #316

Closed fredlcore closed 3 years ago

fredlcore commented 3 years ago

I'm currently testing @do13's modifications to run BSB-LAN on an ESP32 system. I'm using a development board distributed by a company called "Joy-It", but it is available from other vendors and names as well. The board can be seen here, the main difference between the NodeMCU-like boards is the pin count and layout, mine has 30 pins: https://joy-it.net/de/products/SBC-NodeMCU-ESP32 Instead of opening up an extra issue for every quirk or error, I'm going to add "Sub-issues" here, because probably some of them can be fixed quickly. If any of these need more extensive discussion, we should open a "regular" issue for that one. Thanks.

fredlcore commented 3 years ago

Sub-Issue 1: When saving the configuration, the ESP32 reboots, debug output says: [E][WiFiClient.cpp:482] flush(): fail on fd 58, errno: 11, "No more processes" Followed by list of Adding known MAX ID to list messages and a subsequent EEPROM dump.

Browser immediately reports "connection broken" screen, ESP32 does not crash, but configuration is not save either. Manually opening the start page for example works afterwards.

1coderookie commented 3 years ago

Sub-Issue 2: I'd like to test as well, but the loop-error I reported here https://github.com/fredlcore/bsb_lan/pull/309#issuecomment-768216914 still shows up when Arduino IDE should compile, no matter which board I select (also tried the Olimex-POE).

I can offer a 38pin ESP32 NodeMCU DevKit C and an ESP32 D1 R32 (Arduino Uno size) for testing..

dukess commented 3 years ago

Ideas: EEPROM size is 512 bytes? Watchdog? Buggy WiFiClient.cpp?

fredlcore commented 3 years ago

Please number your "Sub-Issue" so that people can reference to it and the discussion does not get too cluttered. Also, the compiler issue is a larger one and so an extra issue should be opened for that one.

do13 commented 3 years ago

@fredlcore: Please change the following line: EEPROMClass EEPROM("eeprom1", 0x800); into: EEPROMClass EEPROM("eeprom1", 0x1000);

I've also seen some strange behaviour with the configuration. There might be other glitches.

fredlcore commented 3 years ago

Changing from 0x800 to 0x1000 did not have any effect in this regard.

fredlcore commented 3 years ago

Sub-Issue 3: Reading and writing to the bus generally works, but writing has issues, probably first of all FIFO-related (which would explain why I didn't see anything on my scope when sending to the bus), but could be other timing issues as well (SoftwareSerial on the Mega was much more unstable than HardwareSerial). For BSB and LPB, the latter will be no issue, but for PPS it means significantly delayed receiving of information by the heating system. First task would be to disable the FIFO on the ESP32. Maybe this could be of help: https://www.esp32.com/viewtopic.php?t=3751#p17752

fredlcore commented 3 years ago

Sub-Issue 4: The heating time programs are "42:30" in most slots. This would be the equivalent of 0xFF. @dukess: Do you have an idea why this could be the case? No matter where the EEPROM is read from, 0xFF should be converted to 0x00, shouldn't it?

fredlcore commented 3 years ago

Re 3: I have added the code as stated there, but no (noticeable) change. On the brighter side, somehow I must have measured wrong, but the scope now shows activity even on the bus side when sending from the ESP32 and occasionally (but rather seldom), a telegram is acknowledged. However, I do not have a logic analyzer to quickly decode whether the timing or the content of the data is scrambled. It could also be that different resistors have to be used in this context because the maximum current is different compared between ESP32 and Due, but I'm not an electrician, so I can't make recommendations here. EDIT: The whole thing seems (also) to be dependent on the power supply. When I power the ESP32 via a USB cable from my MacBook, hardly any telegram is accepted by the heater. However, when I plug the ESP32 into the USB port of my Synology NAS, it is noticeable better (but still far from good or reliable).

dukess commented 3 years ago

@fredlcore Issue 4: You're disable this code which clear values

/*
  for (int i=PPS_TWS;i<=PPS_BRS;i++) {
    if(pps_values[i] == (int16_t)0xFFFF) pps_values[i] = 0;
    if (pps_values[i] > 0 && pps_values[i]< (int16_t)0xFFFF && i != PPS_RTI) {
      printFmtToDebug(PSTR("Slot %d, value: %u\r\n"), i, pps_values[i]);
    }
  }
*/
fredlcore commented 3 years ago

@dukess: Thanks, but I thought I had implemented the same functionality right below that? Instead of cycling between PPS_TWS and PPS_BRS, I cycle through all PPS parameters but only clear those which have the FL_EEPROM flag, i.e. the ones which are stored to / read from EEPROM. Or do I have an error in thinking here?

dukess commented 3 years ago

oh, sorry you right.

dukess commented 3 years ago

Found bug:

  for (int i=15000; i<= 15000+PPS_ANZ; i++) {
    uint8_t flags=get_cmdtbl_flags(i);
    if ((flags & FL_EEPROM) == FL_EEPROM) {
      if(pps_values[i-15000] == (int16_t)0xFFFF) pps_values[i-15000] = 0;
      if (pps_values[i] > 0 && pps_values[i]< (int16_t)0xFFFF) {
        printFmtToDebug(PSTR("Slot %d, value: %u\r\n"), i, pps_values[i]);
      }
    }
  }

pps_values[i] is equal pps_values[15000+] so "if (pps_values[i] > 0 && pps_values[i]< (int16_t)0xFFFF)" is never true

it should be

  for (int i=0; i<= PPS_ANZ; i++) {
    uint8_t flags=get_cmdtbl_flags(15000+i);
    if ((flags & FL_EEPROM) == FL_EEPROM) {
      if(pps_values[i] == (int16_t)0xFFFF) pps_values[i] = 0;
      if (pps_values[i] > 0 && pps_values[i]< (int16_t)0xFFFF) {
        printFmtToDebug(PSTR("Slot %d, value: %u\r\n"), i, pps_values[i]);
      }
    }
  }
fredlcore commented 3 years ago

Ah, of course, thanks!

dukess commented 3 years ago

And one more: beacuse PPS_ANZ is last array element definition should be int16_t pps_values[PPS_ANZ + 1] = { 0 }; instead int16_t pps_values[PPS_ANZ] = { 0 };

OR int16_t pps_values[PPS_ANZ] = { 0 }; and if (bus->getBusType() == BUS_PPS && line >= 15000 && line < 15000 + PPS_ANZ) { // PPS-Bus set parameter instead if (bus->getBusType() == BUS_PPS && line >= 15000 && line <= 15000 + PPS_ANZ) { // PPS-Bus set parameter

for (int i=15000; i< 15000+PPS_ANZ; i++) { instead for (int i=15000; i<= 15000+PPS_ANZ; i++) {

fredlcore commented 3 years ago

Just to be sure: The last PPS command is PPS_E73 with index 91, so it's in total 92 elements (counting from 0). PPS_ANZ is set to 92, so it is not the last element, but the last element + 1. In that case, the code could/should stay as it is?

dukess commented 3 years ago

For sub-issue 1: Bug in registerConfigVariable(CF_PPS_VALUES, (byte *)&pps_values);

should be registerConfigVariable(CF_PPS_VALUES, (byte *)pps_values); & - get address of single variable or array element. whole array will be addressing without &.

dukess commented 3 years ago

Just to be sure: The last PPS command is PPS_E73 with index 91, so it's in total 92 elements (counting from 0). PPS_ANZ is set to 92, so it is not the last element, but the last element + 1. In that case, the code could/should stay as it is?

In this case, you need to use the "less than" sign instead of "less than or equal to", since it will go outside the boundaries of the array

fredlcore commented 3 years ago

Right, thanks!

fredlcore commented 3 years ago

Sub-Issue 4 is fixed now. A remaining problem was that uint8_t flags=get_cmdtbl_flags(15000+i); does not work because the function expects a line number, not a parameter number, grrr...

do13 commented 3 years ago

@fredlcore

  uart_intr.intr_enable_mask = UART_RXFIFO_FULL_INT_ENA_M
                          | UART_RXFIFO_TOUT_INT_ENA_M
                          | UART_FRM_ERR_INT_ENA_M
                          | UART_RXFIFO_OVF_INT_ENA_M
                          | UART_BRK_DET_INT_ENA_M
                          | UART_PARITY_ERR_INT_ENA_M;
  uart_intr.rxfifo_full_thresh = 1; //UART_FULL_THRESH_DEFAULT,  //120 default!! aghh! need receive 120 chars before we see them
  uart_intr.rx_timeout_thresh = 10; // ,  //10 works well for my short messages I need send/receive
  uart_intr.txfifo_empty_intr_thresh = 10; //UART_EMPTY_THRESH_DEFAULT
  uart_intr_config(UART_NUM_1, &uart_intr);

Did you checked these lines? From my point of view, this piece of code gets overwritten in Serial1.begin

A nasty hack would be to set uart->dev->conf1.rxfifo_full_thrhd = 1; uartEnableInterrupt () in esp32-hal-uart.c and implement the following in bsb.cpp

#if defined(ESP32)
  serial->write(msg, loop_len+1);
  serial->flush();
#else
  for (byte i=0; i <= loop_len; i++) {
    data = msg[i];
    if (bus_type != 2) {
      data = data ^ 0xFF;
    }
    serial->write(data);

After these modifications the ESP talks to my heater.

fredlcore commented 3 years ago

You're right, the linked post also mentions to put these lines after Serial1.begin(). I'll try that again. As for your changes, where exactly did you enter these lines in esp32-hal-uart.c? And what does "talks to my heater" mean exactly? I saw that telegrams were accepted by the heater before, but not very stable. Strange enough, it worked with PPS even better than with LPB, although the latter usually is not (that) time-critical as PPS.

fredlcore commented 3 years ago

Also, why the extra treatment of the ESP32 in bsb.cpp? I know you only have a PPS device, but for BSB and LPB, the data needs to be inverted, so the loop is necessary.

fredlcore commented 3 years ago

I can confirm that moving the code-snipped I posted above below Serial1.begin() significantly increases the quality for PPS bus. No changes in the core libraries necessary. I still face a problem with LPB, though. EDIT: I notice that the ESP32 now freezes after some time. Strangely, not even the watchdog fires to reset the ESP32. It just sticks there. Not sure if this has to do with the added OTA update code or with the tweaked serial code. Can anyone of your reproduce this? EDIT 2: A reason for this could be that with the changed settings, an ISR is fired for each byte received by the FIFO. At least @me-no-dev stated here (https://gitter.im/espressif/arduino-esp32?at=5e25e5d40a1cf5414491106d) that

if you want to get it at the moment it enters the fifo, set fifo full thresh to 1, 
but that will trigger ISR too often and you might not like the result

The problem is that even with the standard settings, the ISR would already have been called after "silence" for the duration of two bytes on the bus. Now it's fired after one byte. So for PPS, a delay of the time it takes to receive another two bytes is already critical. We could try and set the FIFO threshold from 1 to 2 and see if it runs stable in both ways then, but that might change from heating device to heating device and from ESP32 to ESP32... EDIT 3: I tried it again with directly changing just the source code and there are no freezes then, so I assume the calls I've added to bsb.cpp are not the default ones when it comes to the interrupts. However, in a direct comparison with running BSB-LAN on Arduino Due connected to PPS bus, the commands sent from the Arduino are still accepted every time. With the changes directly in the library as suggested by @do13, I still get occasionally an error message on the heater, saying there are problems communicating with the room unit. Also, it takes several sent telegrams for changes to become effective on the heater, for example changing comfort setpoint temperature etc. So timing seems still to be an issue. Maybe HardwareSerial is not the way to go and instead SoftwareSerial should be tried?

fredlcore commented 3 years ago

Sub-Issue 5: I've added code for web-based OTA for ESP32 on port 8080, you have to add #define ENABLE_ESP32_OTA in _config.h. It's seemingly not working, because nothing happens after submitting the bin file. Maybe someone of you can take a closer look.

do13 commented 3 years ago

@fredlcore My system doesn't work with your settings. The ESP hangs somewhere after calling uart_intr_config() with modified settings.

Some measurements:

Stock code, without any modifications. We see first byte is transmitted around ~2.3ms after 0x17. Next bytes are delayed by ~50 msecs. Heater doesn't detect the ESP32 .. grafik

After setting uart->dev->conf1.rxfifo_full_thrhd = 1; in uartEnableInterrupt(), first byte is transmitted even faster, but there is still the delay of around 50msec during following bytes: grafik

Next round, sending whole message in one step: grafik No further delays and my heating system is able to communicate with my ESP.

At least via Ethernet, I haven't checked the Wifi version.

It looks like the Arduino core on top of the EPS-IDF system creates a lot of overhead. I guess softwareserial would be similar ....

fredlcore commented 3 years ago

Thanks for the logs, it's strange that my modifications aren't working with your setup. Have you tried resetting the ESP32 a few times? It does hang with my modifications, too, but at least every 2nd or 3rd time it gets to the stage where it actually sends and receives telegrams.

PPS and the other busses have worked with SoftwareSerial on the Mega, and it would have the advantage of not having to deal with the FIFO. In any case, hardcoding any changes in the core libraries is not something we could offer end users as a solution to this problem.

I suggest we move this to a dedicated issue here on GitHub as it seems it's a more complex matter.

fredlcore commented 3 years ago

Sub-Issue 5 is fixed, ESP32 OTA now works.

fredlcore commented 3 years ago

Sub-Issue 1 is fixed, the problem was in this piece of code which now has a new while loop:

        if (p[1] != 'J' && p[1] != 'C') {
//          client.flush();
          while (client.available()) char c = client.read();
        } else {
          if ((httpflags & HTTP_ETAG))  { //Compare ETag if presented
...
          }
        }

client.flush() was causing the error here, and to be honest, I don't remember exactly why I did a flush() here because flush() is only for forcing outgoing bytes to the client. In this case, it only makes sense if it was used to discard any remaining incoming bytes (because only for the JSON POST requests, anything below the header would still be needed).

fredlcore commented 3 years ago

Just as a summary: All previous sub-issues have been fixed/solved or moved to a distinct issue, except for the compiling issues on the Arduino IDE (which I have now also reproduced). @do13, can you have a closer look at this?

dukess commented 3 years ago

Cool!

dukess commented 3 years ago

It also seemed a little strange to me, but I left the buffer reset on the principle - "if it does not interfere - that's fine". As a result, this led to an interesting error. To be honest, I don't really understand why resetting the buffer affects reading, and only on one architecture.

fredlcore commented 3 years ago

Yes, I was wondering the same. But the ESP32 seems to have some checks installed that other systems don't have. For example, it reported the buffer-overflow for the version array which was defined with a size of 8, but its length was actually 15. So at least it helps to get the code cleaner in other parts as well ;).

fredlcore commented 3 years ago

I'll close this now because now I assume that the ESP32 will run generally in the same way as the Due. Otherwise we can open new dedicated issues for other problems.