emsesp / EMS-ESP32

ESP32 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
548 stars 96 forks source link

ESP32 uart handling #23

Closed ArwedL closed 3 years ago

ArwedL commented 4 years ago

Question Do you have an implementation for ESP32 uart handling? Currently I am only interested in receiving. Unfortunately I have to ESP32 because it also servers other purposes.

Additional context I have seen another question in Q&A (May 2019) where you stated that you aren't happy with your current ESP32 implementation - I hope now for some new status (especially if receiving only is relevant)...

proddy commented 4 years ago

Good question. I had a prototype on ESP32 with the UART code written in ESP-IDF and core C last year but stopped to focus solely on ESP8266 as this was more cost effective for the gateway boards. Honestly i'd prefer the ESP32 (or ESP32-S) as it has so much more flash memory and bandwidth. My version 2.0 of EMS-ESP has a core that is fully compatible with the ESP32 apart from the UART library which needs to be re-coded and tested. When I release it for ESP8266 I'll see if I can get the ESP32 up and running too. If you're handy with programming let me know as I could do with some help in testing.

ArwedL commented 4 years ago

With current master I already have it compiling for ESP32 (and Win32). As I told this is the platform with that I have to go (other tasks are also running and ESP8266 isn't an option). The only part missing is UART. Unfortunately my EPS32 (ESP32-Gateway from Olimex) doesn't allow for JTAG debugging. So I am not sure if I really want to develop UART stuff with printf style debugging... Because of JTAG I decided to do all debugging on WIN32 platform. This approach works quite well (e.g. using FTDI 3V3 TTL UART - USB converter). But of course it fails for low-level components like UART handling (which of course is ESP32 specific and can't be emulated on Win32)...

ArwedL commented 4 years ago

I will take your offer and prepare something on my side and would be happy if you could test it (assuming you have a device with JTAG debug possiblity)? In general on ESP32 platform at least sending should be quite comfortable as Espressif SDK offers uart_write_bytes_with_break() function...

proddy commented 4 years ago

It'll be great if you can get the ESP32 uart driver working. The tricky part is the timing and detecting the BRK signals.

Also it may be easier to work on my EMS-ESP version 2.0 branch which builds both ESP8266 and ESP32 and the UART code is isolated. It also builds and runs standalone without an ESP microcontroller which I use for most of the coding and testing, saving me a ton of time. It's still in 'alpha' stage at the moment and doesn't have all the devices like 1.9.x does. The web UI is still in development too which I'm adding very soon. Up to you, what ever is easiest.

I do have a ESP32-prog board that has the JTAG interface so can debug in Visual Studio Code on my Win10 box if needed.

MichaelDvP commented 4 years ago

There is a nice example for ESP32 using events. If this works as descriped, it should be very easy sending with uart_write_bytes_with_break() function and receiving by event UART_BREAK and read the complete string to the pEMSRxBuf. Sadly this does not work on ESP8266 as the event and the tx-function is missing. But as i had some problems sending much (i simulate a RC20 with id0x19 and answer every poll with ack) I've tested a new tx_mode without waiting for the whole telegram sent out. @proddy take a look at tx_mode 4 in this emsuart. Sending is mainly done by the rx_intr. This works very good and shows, that timing isn't that critical as supposed.
I think in most dokumentation there is a misunderstanding about the bus, because people think the telegram is echoed by the master, but we have a differential 2-wire-bus like RS485, all we send is also on the rx line, so we always receive our own sendings simultaniously, the master does not echo. I was confused by oszillograms showing a delay between tx and echo, especially if rx and tx overlap, which is impossible on a halfduplex-bus.

proddy commented 4 years ago

Thanks @MichaelDvP for the ESP32 code. I remember looking at that a while ago and my first version is based loosely off the same design and in version 2.0

I'll take a look at your tx_mode 4. You're right in that the tx_mode 1 logic is quite complicated (with checking for timeouts and breaks) while the code for EMS+ and HT3 is very basic but works equally well. And yes, the master doesn't not echo, its just the x reading back the same data we sent along the Tx.

MichaelDvP commented 4 years ago

Yes all three modes working well, rx and tx was never lost. My problem was only when sending much more: [APP] Uptime: 0 days 0 hours 1 minute 58 seconds [SYSTEM] Last reset reason: Software Watchdog [SYSTEM] Last reset info: Fatal exception:4 flag:3 (SOFT_WDT) epc1:0x4000dd38 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000 and these resets were related to how much i'm sending, so it has something to do with tx. I found nothing in the code which can cause the exception (only EMS_TX_TO_COUNT was initialized factor 10 to high, but it does not matter) and it happens in all 3 modes. All the modes waiting until tx is finished and that seems sometimes to long. I think there are also other things taken long and sometimes it happens that all comes together and the watchdog resets. Mode 4 sends the first byte out and returns to calling function, that saves time and i don't have the watchdog-resets again (running 2 days now with ack every poll on 0x0B and 0x19).

ArwedL commented 4 years ago

I did some analysis of emsuart.cpp to understand how it works. `How does it work currently:

By doing that I detected potential problem and I am wondering if my conclusion is true. In principal everything seems to happen inside context of receive task (even sending) --> Singlethreading --> Good. But functions like ems_setWarmWaterOnetime are called from another task context --> Multithreading with interface CircularBuffer which isn't multi-threading safe... Is this a real problem or is my analysis wrong? Just wondering...

MichaelDvP commented 4 years ago

Sending when?

Sending is ony allowed in reply to a poll or request (within 20ms). emsuart_tx_buffer is (at least) only called from parseTelegram. All functions push the messages to the queue and parseTelegram picks it from the queue and sends it to emsuart_tx_buffer.

emsuart_stop & emsuart_start (public but not called)

Called from ems-esp.cpp only for OTA-Updates.

ArwedL commented 4 years ago

Thanks for clarification regarding sending... How do you see my supsicion of having a potential multi-threading problem in CircularBuffer?

MichaelDvP commented 4 years ago

Afair CircularBuffer isn't used in v2. But that can only answer @proddy correctly, for now i'm not so familiar with v2.

proddy commented 4 years ago

emsuart has it's own simple buffer queue storing the complete telegrams. When one if filled it is sent to the core emsesp. CircularBuffer is not used in Tx.

But now in v2 there are two queues (std::deque), one for Rx and one for Tx, both asynchronous using std::atomic to prevent data race conditions. A telegram is sent to Tx after a poll is received on the Rx line, the Rx disabled and the whole data sent as one block.

proddy commented 4 years ago

also @nomis pointed me to his safe buffer implementation (https://github.com/nomis/EvohomeWirelessFW/blob/master/lib/InterruptSafeBuffer/InterruptSafeBuffer.h) which I'd like to try at some point too.

nomis commented 4 years ago

Something that was frustrating about the Rx process is that it has to wait until the break finishes before it receives the message (which could delay Tx). Ideally it should use a timer to identify when there is nothing else being received and process the message sooner. It may be possible to use the UART timer to do this but it's character sized not bit sized.

I never got Tx working on my boiler so I don't know if any of my changes to Rx would break Tx.

ArwedL commented 4 years ago

emsbus.zip I added a first version of emsuart.cpp and .h with ESP32 support. It compiles for me - I didn't yet test it. It is ESP32 only (dropped all ESP8266 stuff)!

proddy commented 4 years ago

@ArwedL I've added the code to the v2 repo. Builds fine but causes a reset when the uart port is opened so need to debug further. It's probably easier if you work off my latest code base.

ArwedL commented 4 years ago

The latest commit which I can see on v2 branch (https://github.com/proddy/EMS-ESP/tree/v2) is from 16.1.2020 - is this the latest codebase?

proddy commented 4 years ago

I thought that branch was empty. I deleted it. No my EMS-ESP2 is in a private repo and not quite ready for the real world. I'll grant you access so can familiarize yourself with how the modules work.

proddy commented 4 years ago

I made a snapshot of v2 in https://github.com/proddy/EMS-ESP/tree/v2

Note it's not backward compatible for v1. So first wipe the flash on the ESP then upload the new firmware, connect to USB/Serial with 115200 speed/baud, type system to go into the system menu and use the set commands to change the wifi settings. Use help if you get lost and remember to read the README file.

proddy commented 4 years ago

Hi @ArwedL did you get any further debugging the ESP32 UART code?

ArwedL commented 4 years ago

To be honest I hoped for your side to make debugging progress... Any info so far which you can share? I had to shift focus to other open points in my ESP32 project. Will see if I can spend some time.

proddy commented 4 years ago

no worries, I'll have a go and fixing it up after I finished merging in the web code

ArwedL commented 4 years ago

I spent some time and found the issue. For me it works now (only tested receiving) - see attached update emsuart.cpp emsbus.zip The problem was in line "buf_handle = xRingbufferCreate(512, RINGBUF_TYPE_NOSPLIT);" which has to be called before the recvTask and parseTask are created. I didn't had any other problem. uart_driver_install worked for me.

proddy commented 4 years ago

nice! trying it out now...

proddy commented 4 years ago

yes, it kinda works. Data comes in (Rx) but also a lot of miss-fires. I'll take a deeper look later this week.

Annotation 2020-05-18 222237

MichaelDvP commented 4 years ago

I could get a wemos-d1-mini32 for testing. It fits in the BBQKees board (but not in the housing). Settings for this module with this pinout are: sensors.h: SENSOR_GPIO = 18; // Wemos D1-32 for compatibility D5 system.h:
LED_GPIO = 2; // on Wemos D1-32 LED_ON = HIGH; emsuart_esp32.h:
EMSUART_RXPIN 23 // Wemos D1-32 RX pin for compatibility D7 EMSUART_TXPIN 5 // Wemos D1-32 TX pin for compatibility D8

The communication-problem is a bug in the driver. The default interrupt gives on break-intr. only the queue-flag, but does not read the fifo to the buffer (see here). The buffer is only filled in line 808. On break-intr. the telegram is mainly in the fifo and stays there until fifo full intr. Solution is to set fifo-full to 1 with a lot of irq-calls, or use our own irq-routine. Here is a small intr routine, rx is working but i can not see tx in the log. uart.zip terminal

proddy commented 4 years ago

nice Michael! I had a quick try but couldn't see any Rx come in. I could be a fault on my side (broken wires). I'll have another go later.

MichaelDvP commented 4 years ago

I have changed the rx/tx pins for my module, you have to change back. But i think i know what's wrong with tx, i have to set conf0.txd_brk after the tx-buffer is filled and clear it in the irq. It also seems to work with esp8266. I'll test and let you know, stay tuned.

proddy commented 4 years ago

it was a fault cable. I'm getting Rx in but Tx causes a crash. So I think you're on the right track! nice work btw.

ArwedL commented 4 years ago

Also thanks from my side for improving the code. Obviously I don't find the time for making the solution bullet-proof... And of course I assumed correct implementation from Espressif SDK (which seems not to be the case).

MichaelDvP commented 4 years ago

Now with working tx. I have also used the same logic for 8266, saving a lot of interrupts, no waiting for tx done and works fine with my Buderus ems. Maybe some people can test with EMS+ and HT3, i don't believe that they realy need other timings. The file is without the other modes for better reading. It seems the mixing module is asked many times for ems+ messages, but it's only ems. Is there a way to log the raw telegram, the message id 0xDB00 looks odd.

uart.zip esp8266-EMS.log esp32-ser.log

proddy commented 4 years ago

nice work Michael! you've certainly been busy. I've incorporated your changes in alpha7 on the v2 branch. I did a quick test with both ESP8266 and ESP32 and Rx works but still I have issues with Tx which are on my side. I think its because I keep switching circuits and confusing the EMS bus :)

I did notice on the ESP32 when saving to SPIFFS (like making a setting change) you will get a corrupt telegram. So either in the EMSuart::stop() or EMSuart::restart() maybe flush the FIFO?

Those log messages from your mixer unit with 0xDB00 don't look correct indeed. It could very well be something in the code somewhere. To home in on the root cause try:

log trace 21 raw which will show you the raw telegrams from the mixer (device id 0x21), or log trace db00 if you just want to watch that particular telegram ID.

I'll continue testing here too

MichaelDvP commented 4 years ago

I've only done a short test with esp32, but had no issues with the uart. Over the day i'm running 1.9.6b with modified uart in this new 8266 version. Seems stable. The mixer is in my opinion a confusion with ems+. If i send a question like 0b A1 FF 00 01 D8 20 the mixer anwers with a empty ems 21 0b FF 00, but ems-esp awaits a ems+ and takes the CRC as telegram highbyte.

proddy commented 4 years ago

I was able to simulate that and will adjust the logic. Question: should we just ignore these short messages? They can't be parsed because there is no type ID and no data.

proddy commented 4 years ago

I couldn't get Tx working with the latest changes, so merged the old logic back and made it configurable. set tx_mode 4 is the newer tx code (in 2.0a8). If you have time could you compare with your working version and see what could be causing this?

MichaelDvP commented 4 years ago

What issues do you have with tx? Only on 8266 or both? I see only [telegram] Tx buffer full. Looks like Tx is not working? but this is caused by filling the queue with more than 20 messages (including poll-acks) before the master polls. But the messages are send (log from 8266). tx_log.txt The raw log helps a lot to analyse. As you see the MM10-mixer is requested with ems+ telegrams, but can only understand ems (type in telegram[2]), so he answers with type-id FF and empty ems-telegram (i have modified telegram.ccp, line 272 to: if (data[2] < 0xF0 || length < 6) { to avoid reading outside the telegram).
(btw, mixing.cpp, line 156 should be hc_ = device_id() - 0x20 + 1; ) I think it is better to request only the telegrams implemented in the device. So for MM10 request only 0xAB, for MM50,100... request 0x01D7 only for device 0x20, 0x01D8 only for device 0x21, and so on. Regarding the corrupt telegrams when stopping the uart: The first telegram after reenable is always corrupt (to long or first bytes missing if buffers cleared), we can set a flag to drop this first telegram.
But the interrupt routine send in the zip drops telegrams longer than 33 bytes after enable the intr. the fifo is normaly filled by the bus.

MichaelDvP commented 4 years ago

For me all modes working fine, test with change mode, wait a minute, show emsbus, change mode, etc., starting with mode 4: EMS Bus protocol: Buderus, #telegrams received: 252, #Read requests sent: 87, #Write requests sent: 0, #CRC errors: 0 (0%) Tx mode = 2 EMS Bus protocol: Buderus, #telegrams received: 348, #Read requests sent: 118, #Write requests sent: 0, #CRC errors: 0 (0%) Tx mode = 1 EMS Bus protocol: Buderus, #telegrams received: 411, #Read requests sent: 129, #Write requests sent: 0, #CRC errors: 0 (0%) Tx mode = 3 EMS Bus protocol: Buderus, #telegrams received: 448, #Read requests sent: 140, #Write requests sent: 0, #CRC errors: 0 (0%) Tx mode = 4 EMS Bus protocol: Buderus, #telegrams received: 536, #Read requests sent: 172, #Write requests sent: 0, #CRC errors: 0 (0%) I've used this routine: emsuart_esp8266.cpp.txt

With the mixer i meant something like that: mixing.cpp.txt

Is it possible to set some device-flags to the boilers. The 0xE3 .. 0xE9 are not supportet by my boiler. I think they are used by newer boilers or only heatpumps/condensors/compressors?

Also for the thermostat it should be fine if only the active hcs would be requested.

proddy commented 4 years ago

What issues do you have with tx? Only on 8266 or both?

On both ESP8266 and ESP32 when the Tx is sent, there is no acknowledgment, just more Rx. If it works on your setup it must work on mine too so let me experiment a little more using your latest code. Worst case I'll bring out the scope and see what is being transmitted over the EMS line.

I see only "[telegram] Tx buffer full. Looks like Tx is not working?"

Here I wanted a way to detect if the Tx was not working and thought if the Tx queue is full (max 20) this must be a good sign. With a poll happening every 1-2 seconds and doing the queue check every minute it should be pretty fail-safe. Note the poll-acks are not stored as Tx messages so these are only the real read/write commands. But you're right, there will be times when there are a lot of messages in the queue (like after a 'refresh' command) so I'll need to find a better way. Any ideas?

i have modified telegram.ccp, line 272 to: if (data[2] < 0xF0 || length < 6) { to avoid reading outside the telegram)

correct. I had fixed that in an earlier build and modified the whole logic.

btw, mixing.cpp, line 156 should be hc_ = device_id() - 0x20 + 1;

thanks, corrected it!

I think it is better to request only the telegrams implemented in the device. So for MM10 request only 0xAB, for MM50,100... request 0x01D7 only for device 0x20, 0x01D8 only for device 0x21, and so on.

This is a very good point and I had noticed it too. Same with the boiler (your 0xE3-E9 example). Maybe by using device flags like we did with the thermostat is the right approach. And this will also save on flooding the Tx queue with bogus messages that won't be answered anyway.

Also for the thermostat it should be fine if only the active hcs would be requested.

good idea. I'll work on this too.

Regarding the corrupt telegrams when stopping the uart: The first telegram after reenable is always corrupt (to long or first bytes missing if buffers cleared), we can set a flag to drop this first telegram.

Ok. I will test again. I just noticed after each send I would get a CRC error.

it's really nice all the help you're providing, much appreciated Michael.

MichaelDvP commented 4 years ago

What issues do you have with tx? Only on 8266 or both?

On both ESP8266 and ESP32 when the Tx is sent, there is no acknowledgment, just more Rx.

Yes, i know what you mean. I've added some more log-messages to monitor tx and polls and some answers and acks are missing. missing_answer_and_ack.log (some messages are logged from telegram and emsesp, they are received/send single but appear double in the log). To monitor the ack better i added a log for acks and test another version with the same tx, now working. working.log. But i have not figured out yet what caused the problem.

Here I wanted a way to detect if the Tx was not working and thought if the Tx queue is full (max 20) I've changed to 30 and with the mentioned changes in Mixing::Mixing the message does not show up again.

I'l send you a working uart as soon as i understand what's going wrong.

edit: BTW: the test with different modes was false, i doesn't realize that i have to reboot before tx_mode is active.

proddy commented 4 years ago

ok. I'm working on improving the wifi and mqtt calls (I still get some dropouts). I've add the changes to the mixer and also adjusted the thermostat to only fetch the active heating circuits. And I made the check for 'Tx line' more robust. Just need to add device_types to the Boilers and I think I've covered all your comments. It'll be version a10.

proddy commented 4 years ago

Also for the thermostat it should be fine if only the active hcs would be requested.

@MichaelDvP when you query a heating circuit on your thermostat that's not active, what does it come back with? Eg. read 2a8 for HC4 ?

MichaelDvP commented 4 years ago

Here is the uart, working on 8266 in mode 4, on esp32 i'm getting a few CRC error at start and very rarely a "TX read failed", but the device answers, so it's in the echo readback. uart.zip

@proddy i can't read 2a8 since i have no ems+. But these are the RC35 circuits, only hc2 is active: Thermostat_hc1-4.log

I've also found some smaler bugs and typos, should i describe them in a issue or make a pr? (I've put a fork on github with the fixes (and few changes for me like sensor-mqtt)).

MichaelDvP commented 4 years ago

For the esp32 add in the start-routine the line EMS_UART.idle_conf.rx_idle_thrhd = 12; sometimes the break is recognized before the line goes up, The receiver takes the end of the break as startbit and reads a 0xFF at the start of the next telegram.

proddy commented 4 years ago

@MichaelDvP thanks for the updates. I've merged them into a11 in the v2 branch. I've also made you a contributor so feel free to hack away directly in the project branch or push PRs.

A few comments on the changes

MichaelDvP commented 4 years ago

@proddy Thank you for the invitation.

I get Tx errors

Strange, i've tested with the esp32 and a noname wemos-8266-clone (to have the original wemos with 1.9.6 as backup reference), both without tx issues now. Btw: I found, that esp32 resets the tx-break-bit by hardware after sending, no need to do it in the code, but in 8266 we have to clear it in the code.

MQTT format called 'CUSTOM'

the intention was mainly the sensor formatting, I have 7 sensors and sometimes change them (add one on a new place, or disconnect on that is useless), Then the numbering changes and my iobroker-scripts catch the wrong sensor. With json {id:temp,...} the sensors are fixed.

MQTT changes in mixing.cpp. I don't like the static heap

No mqtt-errors, the intention was a 1.9.5 compatible json, were every mixing device can add a own nest and keep the other nests. But now i think it's better to have a json for every device. But we have to uses the device-id (also represents the setting on the switch on MM100) for numbering, otherwise mixing_data1 with hc1 will be overwitten by wwc1. With device-id wwc1 (switch position 8) will become mixing_data8.

proddy commented 4 years ago

TxErrors

using the latest v2 I still don't get Tx working. After each send I get a timeout error. I think I need to hook up the scope.

Annotation 2020-06-01 141944

sensors & MQTT

that's a valid use case for people with more than 1 sensor so happy with any changes you commit that make it better.

MQTT & mixing

yes, the ArdunioJson library uses only the copy constructor when they are char * or consts so there may be conflicts when building the json object. I'll improve this code.

telnet performance

I've been battling to find why the telnet is always not very responsive and I think its due to the Dallas one-wire library interfering with the wifi. Have you ever experienced this? I may re-write that piece of code too.

MichaelDvP commented 4 years ago

Oh, i thought the issues are with tx_mode 4. Now I tested mode 1 a bit longer and can reproduce this errors, but only one error in a few minutes. There are timeouts and break-interruptions and collisions with the next telegram, resultion in bad CRC. Something seems to interrupt and delay the tx routine very long, is there somthing time consuming (like dallas) in another thread? Do you have the errors also with mode 4? this disablees all interrupt while sending, no error in the first 8 minutes now. emsuart_esp8266 .cpp.txt

proddy commented 4 years ago

I'll do some checking this afternoon. Also re-writing the sensor code as it seems to also block the wifi/lwip. There are a couple of delay() calls in the onewire library

MichaelDvP commented 4 years ago

Yes, the onewire is burning a lot of time. Another thing: the RxService::loop() should process a telegram as fast as possible, but is delayed (RX_LOOP_WAIT), skip this.

proddy commented 4 years ago

@MichaelDvP i added the RX_LOOP_WAIT because I thought it was slowing down the telnet, but in the end the culprit was the 1-wire library. So happy to remove it. Except, why should we process an Rx telegram as fast as possible? They're queued up and can be processed every 300ms without effecting any Tx