emsesp / EMS-ESP32

ESP32 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
566 stars 101 forks source link

ESP32 uart handling #23

Closed ArwedL closed 3 years ago

ArwedL commented 4 years ago

Question Do you have an implementation for ESP32 uart handling? Currently I am only interested in receiving. Unfortunately I have to ESP32 because it also servers other purposes.

Additional context I have seen another question in Q&A (May 2019) where you stated that you aren't happy with your current ESP32 implementation - I hope now for some new status (especially if receiving only is relevant)...

MichaelDvP commented 4 years ago

Ups, you're right, i confused process_telegram an incoming_telegram. The short reaction time is on master poll to tx. I added code to check the time from receiving interupt to tx transmit call, and sometimes this takes more than 20 ms. I wonder what happens in this time?

proddy commented 4 years ago

that is odd. Tx should happen immediately after a poll. What you could try is to comment out the sensor_.start() and sensors_.loop() in emsesp.cpp. This made my setup run a zillion times faster.

MichaelDvP commented 4 years ago

@proddy

i added the RX_LOOP_WAIT because I thought it was slowing down the telnet

i thought about that, but we have only a few messages per second and if there is no message the function also returns. I see no benefit. To give telnet/wifi more time i think it's better to add a delay in the main loop. Since the tx-reaction is complete in the emsuart_recvTask, this tasks should have enough time to complete. I'm trying now a delay(MYESP_DELAY) as in 1.9 in the EMSESP::loop() and it seems to help. Also terminal seems more responsive.

proddy commented 4 years ago

@MichaelDvP I found and fixed the issue that was causing Tx to fail with the old logic (tx_mode 1). The value of the timeout was too short. Should be 1760. I suspect a typo in the macro when it was copied over from 1.9.5.

I still can't get the newer Tx code to work (tx_mode 4). Giving me the same errors. Perhaps also a timing error?

Capture

MichaelDvP commented 4 years ago

I don't believe it is the timeout value. I've changed it as i was working on uart, the 10 seems to me as a typo. The timeout is counting loops, each loop is EMSUART_BUSY_WAIT long, which is 1/8 bittime. With EMS_TX_TO_COUNT set to 22 8 10 you get a timeout of 22 8 10bittime/8, e.g. 220 bittimes or 22000 µs. If you wait so long for one byte, you'll get a collision with the next master-poll in the second byte. With 22 8 the timeout is 22 bittimes, 2200µs, a bit more than the EMS+ fixed wait.

The error messages indicates that there is no response from the destination, right? So first question, did we send? Can we receive the echo or is it missing? If there is a echo, what do we receive next, What message is recieved, that triggers the rx, but does not match the tx_waiting? We should log the raw telegrams including break direct in recvTask to see what's come in. For timing it can be usefull to increase the priority of the recvTask.

MichaelDvP commented 4 years ago

I logged with syslog to see the tx-errors and there is a another strange thing. I get reboots every 2 hours (mark is set to 2h, can it be that?) and always after 4 complete tx-errors and the first error of 5th. But the time between retries is very long, seems the counter is'nt cleared in between.

Another thing: [system] and [network] logs with local time (mest), [emseesp] (and also [boiler], [thermostat]) logs with utc. syslog.txt

proddy commented 4 years ago

I don't believe it is the timeout value. I've changed it as i was working on uart, the 10 seems to me as a typo. The timeout is counting loops, each loop is EMSUART_BUSY_WAIT long, which is 1/8 bittime. With EMS_TX_TO_COUNT set to 22 8 10 you get a timeout of 22 8 10bittime/8, e.g. 220 bittimes or 22000 µs. If you wait so long for one byte, you'll get a collision with the next master-poll in the second byte. With 22 8 the timeout is 22 bittimes, 2200µs, a bit more than the EMS+ fixed wait.

The error messages indicates that there is no response from the destination, right? So first question, did we send? Can we receive the echo or is it missing? If there is a echo, what do we receive next, What message is recieved, that triggers the rx, but does not match the tx_waiting? We should log the raw telegrams including break direct in recvTask to see what's come in. For timing it can be usefull to increase the priority of the recvTask.

You're right, 1760ms is a long time within the loop. I'd rather just forget the "tx_mode 1-3" and work on your new and improved Tx logic and figure out why it doesn't work on my setup. I'll also ask BBQKees is he's willing to try out a few things on his boiler. Is there anything specific with your environment? There is a difference with timings between EMS+ and EMS1.0 and I'm on EMS1.0.

proddy commented 4 years ago

I logged with syslog to see the tx-errors and there is a another strange thing. I get reboots every 2 hours (mark is set to 2h, can it be that?) and always after 4 complete tx-errors and the first error of 5th. But the time between retries is very long, seems the counter is'nt cleared in between.

Another thing: [system] and [network] logs with local time (mest), [emseesp] (and also [boiler], [thermostat]) logs with utc. syslog.txt

I'll create a separate issue for this and track it there.

MichaelDvP commented 4 years ago

I've logged the time from rx-intr to send and found the it's always this check: if (millis() > (emsRxTime + EMS_RX_TO_TX_TIMEOUT)) { // send allowed within 20 ms return EMS_TX_WTD_TIMEOUT; that cause the error. I replaced this code by LOG_DEBUG(F("Responsetime: %d"), uuid::get_uptime() - emsRxTime); and it's mainly 0 or 1 but sometimes 29 ms (no other values), but does not give a collision with next telegram. We should skip this check completly.

proddy commented 4 years ago

I see a responsetime of 0 and sometimes 1 with "tx_mode 1". I just can't get tx_mode 4 working, even by adding some delay after each bit write. I'm so busy with the web interface I don't really have time to get the scope out to see how the timings are off on my EMS 1.0 system.

proddy commented 4 years ago

Closing this. Covered in emsesp/EMS-ESP#398