Laxilef / OTGateway

OpenTherm gateway for HomeAssistant
GNU General Public License v3.0
120 stars 15 forks source link

Diyless Thermostat with Wemos D1 Mini ESP32 #12

Closed mennodegraaf closed 6 months ago

mennodegraaf commented 7 months ago

Great project! I would love to make it run on my Diyless Thermostat, which uses a Wemos D1 Mini ESP32. After adjusting the platformio.ini file to use wemos_d1_mini32 it compiles. I have adjusted the pin mapping to use OT_IN_PIN_DEFAULT=21, OT_OUT_PIN_DEFAULT=22 and SENSOR_INDOOR_PIN_DEFAULT=18, which should be correct as I have tested it with https://github.com/diyless/esp32-wifi-thermostat.

I noticed some issues however:

Can you point me in some direction what I can try to get it to work?

Laxilef commented 7 months ago

Thanks! Try using the build "firmware_nodemcu_32s_1.3.3.factory.bin" from the releases page. You can set up pins on the "setup" page. If the OT does not respond, try swapping the input and output pins in the settings. To read temperature from DS18B20 sensors in the device settings in Home Assistant, you need to select “External” for temperature sensors.

mennodegraaf commented 7 months ago

I tried release firmware_nodemcu before, but that did not work on the Wemos D1 Mini32. So I compiled it myself after adding the following section to platformio.ini:

[env:wemos_d1_mini32]
platform = ${esp32_defaults.platform}
board = wemos_d1_mini32
lib_deps = ${esp32_defaults.lib_deps}
lib_ignore = ${esp32_defaults.lib_ignore}
extra_scripts = ${esp32_defaults.extra_scripts}
build_flags = 
    ${esp32_defaults.build_flags}
    -D OT_IN_PIN_DEFAULT=21
    -D OT_OUT_PIN_DEFAULT=22
    -D SENSOR_INDOOR_PIN_DEFAULT=18
    ;-D WOKWI=1
    -D USE_TELNET=0
    ;-D DEBUG_BY_DEFAULT=1
    ;-D WM_DEBUG_MODE=3
Laxilef commented 7 months ago

Hmm, it should work. But I don't have this board to check. I think we need to start by checking the GPIO. Upload a photo of the board with the shield installed.

mennodegraaf commented 7 months ago

I will and can make a PR of the changes to platformio.ini if that helps. I can also provide some output of the error cases.

Laxilef commented 7 months ago

That would be great!

mennodegraaf commented 7 months ago

I just made a local branch with the changes to platformio.ini for adding Wemos D1 mini32 support. To make a PR, I need access to the repo, so I can push my local branch to remote.

Laxilef commented 7 months ago

I think it would be better if you make a fork on Github, make changes and create a PR. You can even edit via browser:

mennodegraaf commented 7 months ago

Today I spent some more time on it. I also tested the same code on a Wemos D1 mini ESP8266 and Opentherm was working there. Comparing the ESP32 with ESP8266 over the web interface and Telnet, it turned out that the ESP32 version was very very slow. So it might have to do with some issues in the ESP32 scheduler which times out the Opentherm functionality.

Laxilef commented 7 months ago

OpenTherm doesn't work on your ESP32?

mennodegraaf commented 7 months ago

Indeed, Opentherm does NOT work on the D1 mini ESP32, but does work on the D1 mini ESP8266. As said, the difference might be caused by the scheduler as the ESP32 is very slow.

Laxilef commented 7 months ago

OK thanks. I'll check this out soon.

mennodegraaf commented 7 months ago

Great, btw I also noticed that when building with DEBUG_BY_DEFAULT it is stored in settings.debug and put in EEPROM once. At time at startup, this variable is retrieved from EEPROM. However, there is no way to change the configuration from the web interface, so if you have once set it to false or true, you can not change it later anymore with the build flag.

Laxilef commented 7 months ago

You can change the debug value in the device settings in the Home Assistant

https://github.com/Laxilef/OTGateway/wiki/FAQ-&-Troubleshooting#how-can-i-activate-deactivated-sensors-and-settings-in-my-home-assistant

mennodegraaf commented 7 months ago

Enabled debug and noticed an issue with shared data on the ESP32. The telnet output is very hard to read, which I think is caused by the ESP32 having 2 cores so the different tasks really run in parallel and may access shared data simultaneously unless this is prevented with a mutex. This happens a lot with the TinyLogger::print functions, but may also happen with other data shared between tasks (like data shared between OT and MQTT task).

Laxilef commented 7 months ago

You're right. ESP32Scheduler was written only for code compatibility with the ESP32 platform. I really didn’t go deep into mutex and other features FreeRTOS, because the tasks code is not very related to each other.

I'll fix the logger, thanks.

mennodegraaf commented 7 months ago

@Laxilef I managed to get the Opentherm functionality working! After investigating the interval and cycle times of the various tasks, it did not make sense (some tasks like OT were never executed, others at unexpected intervals). After I changed xTaskCreatePinnedToCore() to xTaskCreate() hereby letting the scheduler determine itself how to distribute the tasks over the cores it was ok. So there seems to be an issue with the PinnedToCore method.

Laxilef commented 7 months ago

Hmm, that's interesting! Am I correct in understanding that the Opentherm Task was never executed on your board? Were there messages in the logs from Opentherm Task with debug enabled?

I wanted to try reducing the task sleep time from 10ms to 1ms and see if that helps:

    ot->setYieldCallback([](void* self) {
      static_cast<OpenThermTask*>(self)->delay(1); // 10 to 1 or yield()
    }, this);

I read that it is better to "attach" work with wifi on ESP32 to a separate core (which is not used in other tasks) so that it works faster. Perhaps this is not a good solution.

mennodegraaf commented 7 months ago

Indeed, the Opentherm task was (almost) never executed. And the funny thing is that the same happened also to the MQTT task, so I could not enable debug.

I am not sure if changing the delay from 10ms to 1ms makes such a difference as another task will be executed anyway which likely take a bit longer than that. Btw, by not setting a yield callback it will use yield by default.

mennodegraaf commented 7 months ago

If you are interested, I stored the millis() in Task.h before entering loop() and computed cycle time (time that loop is busy) and interval time (between current and previous start of loop) and logged the result. This is how it should be with xTaskCreate and you can see clearly that tasks are allocated to different cores (whatever is available).

[OpenTherm][TRACE] Loop on core 1, interval [6082/7035]
[Main][TRACE] Loop on core 1, interval [6/115]
[WifiManager][TRACE] Loop on core 1, interval [2/118]
[Mqtt][TRACE] Loop on core 1, interval [606/106]
[Main][TRACE] Loop on core 0, interval [5/114]
[WifiManager][TRACE] Loop on core 1, interval [2/113]
[Mqtt][TRACE] Loop on core 1, interval [0/712]
[Main][TRACE] Loop on core 0, interval [3/112]
[WifiManager][TRACE] Loop on core 0, interval [2/111]
[Mqtt][TRACE] Loop on core 1, interval [0/107]
[Main][TRACE] Loop on core 0, interval [3/110]
Laxilef commented 7 months ago

Hmm. MqttTask had to run on core 0, together with WifiManagerTask. And other tasks on 1 core. It's strange that only MqttTask and OpenThermTask were not executed. However, I was unable to achieve this behavior on my nodemcu 32s.

mennodegraaf commented 7 months ago

Do you have a physical nodemcu32? So this might only affect the D1 mini ESP32? Why can MqttTask and WifiManagerTask not run on both cores? You can also play with Task priority, how about giving SensorTask, OpenThermTask and MqttTask a higher prio?

Laxilef commented 7 months ago

Yes, there is, but I did not connect it to the boiler, I am waiting for the delivery of the mini version. It is impossible to assign higher priority to tasks, they block the core and the watchdog is triggered, because some tasks are executed without interval. But it was understood that they would alternate. Of course, we can run tasks on both cores and let the scheduler independently decide on which core to run the tasks. I'll probably do that, thanks for the info!

Laxilef commented 7 months ago

I found S2 mini for testing. Indeed, some tasks are never performed or are rarely performed. It’s strange that on nodemcu 32s it works differently, as if there is a built-in algorithm for interleaving tasks with the same priority. Most likely I need to correct the scheduler and add queues there.

Laxilef commented 7 months ago

It looks like I understand what the problem is. Since freertos uses time slicing, if task1 used 100ms of CPU time, then until other tasks with the same priority also use 100ms of CPU time, the scheduler will not start task1. Even though I try to switch context in long tasks, there are third party libraries that block the task until the action is completed. For example, a library for working with mqtt. It contains functions that send data to the mqtt server, and it’s probably impossible to switch the context in the write cycle. Based on this information, we need to make task switching a simple interleaving method. However, when time slicing is disabled, the scheduler will simply launch any task that is ready without interleaving, which also does not suit us. Changing task priorities did not produce the desired result.

@mennodegraaf do you have any ideas?

mennodegraaf commented 7 months ago

If I look at the source code, it seems the cooperative way of scheduling is assumed and tasks are not preempted by the scheduler, but they execute a yield or delay themselves after the loop (and some tasks even do it during the loop). Not sure if and how that could be adjusted on ESP-IDF?!

I would expect that tasks with strict timing requirements (like sensor, opentherm, regulator) need to run at a higher priority, so the scheduler first selects them when they are Ready to run. It is vital for these high prio tasks that they are periodic and the execution time is very limited, so there is enough time left to run low prio tasks (like wifi and mqtt). I have played a bit with this, but did not get it working well enough. Reasons could be:

Laxilef commented 7 months ago

Yeah, you were right about priorities. I managed to run all the tasks on the s2 mini and, looking at the logs, everything works fine. Although there were problems not only in the priorities of tasks, but also in working with the MQTT and this was confusing. I've committed the changes, but now I need to check that opentherm is working. You can also check it yourself using the latest version of the code.

Laxilef commented 7 months ago

For debug the scheduler i used this task code:

#pragma once
#include <Arduino.h>
#include "AbstractTask.h"

class Task : public AbstractTask {
public:
  Task(bool enabled = true, unsigned long interval = 0) : AbstractTask(enabled, interval) {}

  void enable() override {
    AbstractTask::enable();
    begin();
    vTaskResume(tHandle);
  }

  void disable() override {
    AbstractTask::disable();
    vTaskSuspend(tHandle);
  }

  void yield() override {
    //Serial.printf("TASK '%s' YIELD START, CORE: %d, PRIORITY: %d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle));

    this->estimated += micros() - this->loopStart;
    taskYIELD();
    this->entryCount++;

    if (millis() - this->prevReportYield > 1000) {
      Serial.printf("TASK '%12s' YIELD END\tCORE: %2d PRIORITY: %2d, ESTIMATED: %d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle), (unsigned long) (this->estimated / 1000));
      this->prevReportYield = millis();
    }

    this->loopStart = micros();
  }

  void delay(unsigned long ms) override {
    //Serial.printf("TASK '%s' DELAY START, CORE: %d, PRIORITY: %d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle));
    this->estimated += micros() - this->loopStart;
    vTaskDelay((ms == 0 || ms < portTICK_PERIOD_MS) ? 1 : ms / portTICK_PERIOD_MS);
    this->entryCount++;

    if (millis() - this->prevReportDelay > 1000) {
      Serial.printf("TASK '%12s' DELAY END\tCORE: %2d PRIORITY: %2d, ESTIMATED: %d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle), (unsigned long) (this->estimated / 1000));
      this->prevReportDelay = millis();
    }

    this->loopStart = micros();
  }

protected:
  virtual const char* getTaskName() {
    return "";
  }

  virtual BaseType_t getTaskCore() {
    return tskNO_AFFINITY;
  }

  virtual uint32_t getTaskStackSize() {
    return 10000;
  }

  virtual int getTaskPriority() {
    return 1;
  }

  void static xLoopWrapper(void* pvParameters) {
    Task* task = static_cast<Task*>(pvParameters);
    while (true) {
      task->loopWrapper();
    }
  }

  void begin() override {
    if (!enabled || tHandle != nullptr) {
      return;
    }

    BaseType_t coreId;
    if (getTaskCore() == tskNO_AFFINITY || ESP.getChipCores() == 1 || getTaskCore() > (ESP.getChipCores() - 1)) {
      coreId = tskNO_AFFINITY;

    } else {
      coreId = getTaskCore();
    }

    xTaskCreatePinnedToCore(
      this->xLoopWrapper,
      getTaskName(),
      getTaskStackSize(),
      this,
      getTaskPriority(),
      &tHandle,
      coreId
    );
  }

  void loopWrapper() override {
    if (!setupDone) {
      setup();
      setupDone = true;
      yield();
    }

    if (millis() - this->prevReport > 1000) {
      Serial.printf("TASK '%12s' LOOP START\tCORE: %2d PRIORITY: %2d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle));
    }
    this->estimated = 0;
    this->loopStart = micros();
    loop();
    this->estimated += micros() - this->loopStart;
    if (millis() - this->prevReport > 1000) {
      Serial.printf("TASK '%12s' LOOP END\tCORE: %2d PRIORITY: %2d, ENTRY COUNT: %d, ESTIMATED: %d\r\n", this->getTaskName(), xPortGetCoreID(), uxTaskPriorityGet(this->tHandle), this->entryCount, (unsigned long) (this->estimated / 1000));
      this->prevReport = millis();
    }

    if (interval == 0) {
      yield();

    } else {
      delay(interval);
    }
  }

private:
  TaskHandle_t tHandle = nullptr;
  unsigned int entryCount = 0;
  unsigned long prevReport = 0;
  unsigned long prevReportYield = 0;
  unsigned long prevReportDelay = 0;
  unsigned long estimated = 0;
  unsigned long loopStart = 0;
};
Laxilef commented 7 months ago

Yes! I see that it works. But not ideal: many requests with a timeout response.

Laxilef commented 7 months ago

As I continued debugging, I noticed some unusual behavior. If we switch context using yield() or delay(10) in a CustomOpenTherm::sendRequest() loop, then most requests fail. However, if we use delayMicroseconds(10000), then everything starts to work correctly. Now I see that when switching context, interrupts are not processed or some interrupts are skipped.

mennodegraaf commented 7 months ago

Yeah, related to this I noticed that the OT task tasks a long time to finish, my guess would be that the yield or delay triggers a context switch and after that it is up to the scheduler to determine when this task is run again whereas a delayMicroseconds() would just block the OT task (but keep it running). Besides that, the OT task is a bit special since it uses an interrupt routine to check for changes on the OT input pin.

I did not have time to to additional testing, but read this reference for some background: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html

Some preliminary comments:

Laxilef commented 7 months ago

Any task is just an endless loop. At the end of each iteration, yield/delay is called to perform other tasks, that is, at the end of each iteration there is a context change (almost always). When we see that the iteration may take a long time, we call yield/delay and give control to the scheduler. However, interrupts have higher priority. Any task must be suspended at the moment of interruption. This should happen, but I see that after a context switch the interrupt handler is not called/is not always called.

A few days ago I tried enabling/disabling configUSE_PREEMPTION, configUSE_TIME_SLICING, configIDLE_SHOULD_YIELD. This does change the behavior of the scheduler, but espressif does not recommend doing this.

Laxilef commented 7 months ago

Probably I found a bug in the opentherm library. According to the protocol, the start bit of the response must be expected no earlier than after 20 ms. Some random signal was perceived as a start bit and all further data was distorted.

I also see that adding a short delay after the stop bit has a positive effect.

    sendBit(HIGH); //stop bit
    delayMicroseconds(1000);

Modified library: https://pastebin.com/5y44dDT7 Can you test the latest version from the master branch?

Laxilef commented 7 months ago

Well, I found and fixed the problem in the library. Added PR for correction. https://github.com/ihormelnyk/opentherm_library/pull/55

Please test this with your boiler. Modified library: https://pastebin.com/47uQVqSm

mennodegraaf commented 7 months ago

Well done! I hope to give it a try this evening.

mennodegraaf commented 7 months ago

Gave it a try and it seems no OT timeouts are present anymore.

[SENSORS][INDOOR][INFO] New temp: 24.940001
[MAIN][VERB] Free heap size: 150972 of 291436 bytes, min: 150972 bytes (diff: 140464 bytes)
[OT][TRACE] OT REQUEST ID:    0   Request: 80000100   Response: 4000010a   Attempt:  1   Status: SUCCESS
[MAIN][VERB] Free heap size: 149960 of 291436 bytes, min: 149960 bytes (diff: 1012 bytes)
[OT][TRACE] OT REQUEST ID:   17   Request:   110000   Response: 40111f00   Attempt:  1   Status: SUCCESS
[MAIN][VERB] Free heap size: 147580 of 291436 bytes, min: 147580 bytes (diff: 2380 bytes)
[OT][TRACE] OT REQUEST ID:   25   Request: 80190000   Response: c0192000   Attempt:  1   Status: SUCCESS
[MAIN][VERB] Free heap size: 145264 of 291436 bytes, min: 145264 bytes (diff: 2316 bytes)
[SENSORS][INDOOR][TRACE] Raw temp: 25.062500
[MAIN][VERB] Free heap size: 145140 of 291436 bytes, min: 145140 bytes (diff: 124 bytes)
[OT][TRACE] OT REQUEST ID:    0   Request: 80000100   Response: 4000010a   Attempt:  1   Status: SUCCESS
[MAIN][VERB] Free heap size: 143560 of 291436 bytes, min: 143560 bytes (diff: 1580 bytes)
[OT][TRACE] OT REQUEST ID:   17   Request:   110000   Response: 40111f00   Attempt:  1   Status: SUCCESS
[OT][TRACE] OT REQUEST ID:   25   Request: 80190000   Response: c0192000   Attempt:  1   Status: SUCCESS
[OT][HEATING][INFO] Set temp = 35
[OT][TRACE] OT REQUEST ID:    1   Request: 90012300   Response: 50012300   Attempt:  1   Status: SUCCESS
[OT][TRACE] OT REQUEST ID:    0   Request: 80000100   Response: 4000010a   Attempt:  1   Status: SUCCESS
[SENSORS][INDOOR][TRACE] Raw temp: 25.062500
[OT][TRACE] OT REQUEST ID:   17   Request:   110000   Response: 40111f00   Attempt:  1   Status: SUCCESS
[OT][TRACE] OT REQUEST ID:   25   Request: 80190000   Response: c0192000   Attempt:  1   Status: SUCCESS
Laxilef commented 7 months ago

Great :) Many thanks for your help!

mennodegraaf commented 7 months ago

A few observations still:

mennodegraaf commented 7 months ago

Another thought. My DS18B20 temp sensor reports a way too high temperature (more than 6 deg too high). The ESP32 has bluetooth support and the Diyless thermostat comes with an accurate BLE temperature sensor. Would it be interesting to add support for Bluetooth sensors?

Laxilef commented 7 months ago

The OT task does not run close to every 1s, but more like once every 5s

That's right. Executing requests does take time, after each request there is a task switch so that other tasks execute correctly.

Setting master/slave version on my boiler fails, same for setting master config. Does that matter much?

If your boiler does not support these IDs, then they are not important to it.

The TelnetStream logger seems to be (a lot) slower on ESP32 than on ESP8266

I noticed this too. I think this is a feature of the TCPServer implementation on ESP32. It may be worth adding a buffer, because now it is sending one character at a time. The same thing happened with the mqtt client, and when I added a buffer, the speed of sending the data increased greatly.

Having (relative) timestamps in the log would be very helpful for debugging

Good idea

My DS18B20 temp sensor reports a way too high temperature (more than 6 deg too high).

To correct the temperature you can use number.opentherm_outdoor_sensor_offset and number.opentherm_indoor_sensor_offset

The ESP32 has bluetooth support and the Diyless thermostat comes with an accurate BLE temperature sensor. Would it be interesting to add support for Bluetooth sensors?

That would be great, but I have no experience with BLE.

mennodegraaf commented 7 months ago

That would be great, but I have no experience with BLE.

I can have a look into the BLE temp sensor.

mennodegraaf commented 7 months ago

Another feature that is of great interest to me, is multi-zone support. So, one OT interface, but multiple rooms each with a sensor and a regulator class. The heat demand of each room is combined and sent over OT. With the possibility to control a valve (on a radiator or underfloor divider) the heat can be sent only to the rooms that need it.

Do you have some thoughts on this?

Laxilef commented 7 months ago

I can have a look into the BLE temp sensor.

It would be interesting

The heat demand of each room is combined and sent over OT.

How? Using OT, we can simply transmit the desired coolant temperature.

Additional adjustments in rooms can be made using the generic thermostat integration in the Home Assistant. How I did it:

  1. A zigbee temperature sensor is installed in each room
  2. There is an individual pipe from the collector for each room
  3. STOUT STE-0010 valves are installed on the collector for pipe each room
  4. Valves are connected to a multichannel zigbee relay. For example 8 channel DIY relay from modkam
  5. Generic thermostats are created for each valve in the home assistant.
mennodegraaf commented 7 months ago

I can have a look into the BLE temp sensor.

I managed to read the battery level, temperature and humidity from the BLE sensor. The default BLE stack size is huge however, so it does not fit in flash and ram anymore. Switching to NimBLE instead and try to get it to fit all together.

martinarva commented 6 months ago

I have the Diyless kit: https://diyless.com/product/opentherm-thermostat

Managed to flash it to the latest OTGateway. Everything except OT works.

From debug:

 2   Status: TIMEOUT                                                            
[DEBUG] OT REQUEST ID:    3   Request:    30000   Response:        0   Attempt: 
 3   Status: TIMEOUT                                                            

Tried to swap the in and out pins in the settings but no luck.

Laxilef commented 6 months ago

1.3.3? Build the firmware from the master branch and flash your esp.

martinarva commented 6 months ago

I used this: firmware_nodemcu_32s_1.3.3.factory.bin

Laxilef commented 6 months ago

Build the firmware from sources via platformio. Or wait for the release of 1.4.0

martinarva commented 6 months ago

I don't know if its important, but just checked whats under the cover and it's ESP-WROOM-32 not D1 Mini.

EDIT: Should i build d1_mini32 version?

martinarva commented 6 months ago

And seems that i'm not up to task building firmware my own. Would it be possible you can do this for me @Laxilef?

Laxilef commented 6 months ago

I don't know if its important, but just checked whats under the cover and it's ESP-WROOM-32 not D1 Mini.

Upload the board photo

Please note that this firmware is still being tested. 1.4.0-rc.5.zip