emsesp / EMS-ESP

ESP8266 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
300 stars 96 forks source link

Protocol / Queue handling issue #151

Closed susisstrolch closed 4 years ago

susisstrolch commented 4 years ago

Today my Logic Analyzer arrived, so I found the time to look what's going on on EMS-Bus. Environment:

Issue 1 - EMS-ESP continues sending, even not being polled again

1. MC10:    08 0B 16 00 FF 5A 64 00 06 FA 0A 01 0F 64 64 02 
            08 F8 0F 0F 0F 0F 1E 05 04 09 09 00 6D 00
2. ESP: 0B 00
3. MC10:       0B 00 89 00 85
4. ESP:       0B 88 14 00 20 E4 00
<200ms pause>
  1. MC10 sends a MC10Parameter message to us.
  2. We acknowledge with "0B " -> ok, fini
  3. MC10 echos our acknowledge??? The from MC10 looks fine, it's approx. 1.035ms
  4. We continue to poll MC10, which is simply ignored by the busmaster by not sending an echo. Selection_023

Issue 2 - pretty similiar to Issue 1

1. RC35:    10 0B 3E 00 00 00 00 7D 00 00 00 00 00 00 00 00
            00 11 05 00 E9 00
2. ESP: 0B 00
3. RC35:       0B 00
4. ESP:       0B 90 3D 00 20 80 00
5. RC35:             0B 90 7A
<200ms pause>
  1. RC35 sends a HK1MonitorMessage to us
  2. We respond with "0B " -> ok, fini
  3. RC35 echos our acknowledge
  4. We start sending w/o request
  5. Busmaster stops echo after 2nd byte and stays silent for 200ms. Selection_020
susisstrolch commented 4 years ago

We should really try the watchdog feed before each timeconsuming function call. Because it‘s the software watchdog which gets triggered we don‘t get a stacktrace. So we must iterate by try and error.

proddy commented 4 years ago

yes, we need to keep the code in ISRs non-blocking and highly optimized. emsuart_tx_buffer() has grown in complexity quite significantly since 1.7 with many loops and race conditions.

proddy commented 4 years ago

Or disable wdt before the Tx is called from ems.cpp with ESP.wdtDisable()and enable it after the acknowledgement poll has been received with ESP.wdtEnable(0)

susisstrolch commented 4 years ago

I‘ll upload a log in jabber mode - there you can see that Tx/Rx aren‘t the bottlenecks.

susisstrolch commented 4 years ago

And here an interesting arcticle about soft-wtd:

https://www.sigmdel.ca/michel/program/esp8266/arduino/watchdogs_en.html#ESP8266_WDT_TIMEOUT

susisstrolch commented 4 years ago

pushed a new release of txmode2 branch which injects wdtfeed() in MyESP.loop.

proddy commented 4 years ago

In all versions up to 1.8.0 I had this line in MyESP.loop():

yield(); // ...and breath

which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no?

susisstrolch commented 4 years ago

It‘s still in. But yield does more than calming the WTD - it also cares about WIFI stuff. wtdFeed only restarts/resets the HW and SW watchdog and doesn’t have any further overhead.

Sent by mobile device

Am 31.07.2019 um 13:23 schrieb Paul notifications@github.com:

In all versions up to 1.8.0 I had this line in MyESP.loop():

yield(); // ...and breath

which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bbqkees commented 4 years ago

So I've been running the latest build from yesterday evening in the txmode2 branch. At first I got lots of reboots (every 2 minutes or so) but now its been running for 10h without hickups (jack powered, not bus powered).

proddy commented 4 years ago

It‘s still in. But yield does more than calming the WTD - it also cares about WIFI stuff. wtdFeed only restarts/resets the HW and SW watchdog and doesn’t have any further overhead. Sent by mobile device Am 31.07.2019 um 13:23 schrieb Paul @.***>: In all versions up to 1.8.0 I had this line in MyESP.loop(): yield(); // ...and breath which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

There is also a delay(1) in the ems-esp.cpp loop I used to calm down the wifi after seeing how ESPhome and other projects did this because of ardunio 2.5.0. It might no longer be necessary?

susisstrolch commented 4 years ago

delay() is also doing the WIFI and SWDT handling, so it shouldn't hurt at all. But I can try to remove it...

susisstrolch commented 4 years ago

Found that one: https://github.com/letscontrolit/ESPEasy/issues/2477

proddy commented 4 years ago

Found that one: letscontrolit/ESPEasy#2477

nice, didn't know you could do that. Let's add that too as it'll help us find the root cause for the WDT resets.

bbqkees commented 4 years ago

Possibly unrelated but ran txmode2 firmware for 20+ hours (with an open but idle Telnet session) without problems. However, after doing 'log v' it rebooted after a few minutes.

proddy commented 4 years ago

that's good news for @susisstrolch's new tx code. the logv does a lot of string manipulation (as I avoid using the String library and sprintf() ) so most probably its a memory error I need to look into.

proddy commented 4 years ago

@bbqkees @susisstrolch also unrelated - I uploaded my latest web version under the newweb branch if you want to play with it. Still need to refine a few things but I think its stable. Look carefully at the CHANGELOG on how to build it because the build scripts have also changed. I'm off now and will pick things up when I'm back in a week.

bbqkees commented 4 years ago

Ok will try. The txmode2 build from last week is still running here uninterrupted at 3 days (bus powered). Telnet session still active.

susisstrolch commented 4 years ago

I have 2days 6hr parasitare Mode.

Sent by mobile device

Am 05.08.2019 um 09:44 schrieb Kees notifications@github.com:

Ok will try. The txmode2 build from last week is still running here uninterrupted at 3 days (bus powered). Telnet session still active.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bbqkees commented 4 years ago

@proddy Took me some time because of lots of build errors but I was able to build the firmware in the end.

The newweb branch needs two additional libraries: ESPAsyncUDP and ESPAsyncWebServer. (PIO Home->Find libraries->type name-> install) Maybe its Windows or just my particular setup but the gulp build did not work with the 'debug' parameter. So did node gulp command in the correct folder and after that went through fine and the compacted web code files were added to the 'webh' folder, the build in pio went Ok.

The new web interface looks great, will test it over the weekend.

proddy commented 4 years ago

Did you use the latest platformio.ini example file? I would have expected pio to download the libraries automatically.

On Thu, 8 Aug 2019 at 14:27, Kees notifications@github.com wrote:

@proddy https://github.com/proddy Took me some time because of lots of build errors but I was able to build the firmware in the end.

The newweb branch needs two additional libraries: ESPAsyncUDP and ESPAsyncWebServer. (PIO Home->Find libraries->type name-> install) Maybe its Windows or just my particular setup but the gulp build did not work with the 'debug' parameter. So did node gulp command in the correct folder and after that went through fine and the compacted web code files were added to the 'webh' folder, the build in pio went Ok.

The new web interface looks great, will test it over the weekend.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/proddy/EMS-ESP/issues/151?email_source=notifications&email_token=AAJMO6EFS52AQ3C6EO3RWFLQDQGLJA5CNFSM4IEEHHT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33OJIY#issuecomment-519496867, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJMO6DM4KG4COMWJ7WMRK3QDQGLJANCNFSM4IEEHHTQ .

bbqkees commented 4 years ago

Yes, the new one with the 'pre:scripts/buildweb.py' etc.

I had the same issue when you added f.i. OneWire initially.

proddy commented 4 years ago

That's strange. In theory if you remove the whole .pio folder/directory it should go fetch all the libs. I'll try on a fresh win install this weekend

proddy commented 4 years ago

@bbqkees tested on a fresh win install with platformio 4 and you shouldn't need to download anything manually. As soon as the platformio.ini file is there it will automatically fetch the latest libraries. We can look at your config next week.

@susisstrolch I merged the txmode2 branch into the newweb branch and it's been running fine for the last 8hrs. Eventually I'll move all this into dev which is 1.9.0 so let me know if you're planning any further changes. We still need to test with Junkers.

proddy commented 4 years ago

closing for now. txmode2 merged into dev.