emsesp / EMS-ESP32

ESP32 firmware to read and control EMS and Heatronic compatible equipment such as boilers, thermostats, solar modules, and heat pumps
https://emsesp.github.io/docs
GNU Lesser General Public License v3.0
590 stars 100 forks source link

No MQTT Boiler-Data after Update to 3.6.5 #1684

Closed Marcwa19197 closed 5 months ago

Marcwa19197 commented 6 months ago

PROBLEM DESCRIPTION

Since the update to 3.6.5 yesterday im not able to see any new "boiler_data" in my Grafana Dashboard. Only if i reboot the Device, i can see new data for a few seconds. Only the boiler_data is affected. All other data is working.

Using MQTT-Explorer i can see the data. But it is not a problem with my setup. It worked to exactly the minute where EMS-ESP was rebooted after the update. Also i can see some "Network timeouts" in the Webinterface. It looks like the Version is a bit "unstable".

Was there anything changed?

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!


[emsesp_system_info.txt](https://github.com/emsesp/EMS-ESP32/files/14832561/emsesp_system_info.txt)

TO REPRODUCE

Update to 3.6.5

EXPECTED BEHAVIOUR

A clear and concise description of what you expected to happen.

SCREENSHOTS

image

Also on Login-Page if i try to login im not able to anymore...:

image

I rebooted the EMS several times now (also i uplugged it)

ADDITIONAL CONTEXT

Add any other context about the problem here.

(Please, remember to close the issue when the problem has been addressed)

proddy commented 6 months ago

I think this could be the WiFi strength. Could you attach the support information (http://ems-esp.local/api/system) so we can see what TxPower is set to, which is under Network Info. This was a new addition in 3.6.5

Marcwa19197 commented 6 months ago

emsesp_system_info.txt

Marcwa19197 commented 6 months ago
image

Reading the Changelog, the Setting should be set to "auto", or? Should i try it? :)

Marcwa19197 commented 6 months ago

I rebooted a few times, so now wifi connection is "green" again:

image

i will wait a few minutes to see if the data is coming in again.

proddy commented 6 months ago

ok, strange. the tx wifi power looks fine.

Marcwa19197 commented 6 months ago

hmm. no. so still only boiler_data is missing...

Marcwa19197 commented 6 months ago
image image

but the strange thing is... mqtt-explorer can see the data. Also i get it in the telegraf log. But it looks like its not written to Influxdb. But if i restart the ems-esp one Datapoint is written.... No other changes were done before the Update of the EMS.

Marcwa19197 commented 6 months ago

Is there a way to downgrade to the version before? Just to check...

MichaelDvP commented 6 months ago

ok, strange. the tx wifi power looks fine.

Yes, but "RSSI": -92, does not look good. Have you a mesh and ems-esp connects to far away AP? Check if there is a stronger AP with WiFi-Scan and connect with BSSID setting.

Is there a way to downgrade to the version before? Just to check...

Yes, you can always upload a lower version, or, since this release, you can can reboot to the other partition in telnet. Check in system info the active partitons and if it is app1 in telnet type restart app0 (Have not ried, but should also work with mqtt in mqtt explorer topic: system: {"cmd":"restart", "data":"app0"}) From old to new version you have to upload/flash the new version again.

Marcwa19197 commented 6 months ago

Yes, but "RSSI": -92, does not look good. Have you a mesh and ems-esp connects to far away AP? Check if there is a stronger AP with WiFi-Scan and connect with BSSID setting.

Yes, this was a problem. Sticked it now to the nearest AP. But issue is still there.

Yes, you can always upload a lower version, or, since this release, you can can reboot to the other partition in telnet. Check in system info the active partitons and if it is app1 in telnet type restart app0 (Have not ried, but should also work with mqtt in mqtt explorer topic: system: {"cmd":"restart", "data":"app0"}) From old to new version you have to upload/flash the new version again.

Checked it, im running on app0. So i will just reupload the old 3.6.4 right now, just for a test. Strange for me is, all data is written, but nothing from boiler_data topic. Really strange. I double checked all relevant Services and logfiles. Nothing error related... (Telegraf, InfluxDB)

Marcwa19197 commented 6 months ago

Just reverted back and rebooted the ems. Data is coming back in...

image

(Entity shown is boiler_data/heatblock)....

MichaelDvP commented 6 months ago

The app0/app1 was an example it also works the other way (or with restart boot if there is a boot-partition).

I'm not using HA, but maybe there was a change. Try in mqtt settings disable HA, submit (keep mqtt activated), wait ~5 minutes, enable HA, submit. This will clear and rewrite the HA autodiscovery.

proddy commented 6 months ago

Strange for me is, all data is written, but nothing from boiler_data topic. Really strange. I double checked all relevant Services and logfiles. Nothing error related... (Telegraf, InfluxDB)

If you're seeing the Boiler data come into Home Assistant every 10-60 seconds, for example by looking at a sensor, then EMS-ESP is doing it's job, and the link to InfluxDB is probably not working. You can check the HA's log files or raise the log level in HA to show more details.

Marcwa19197 commented 6 months ago

MY Setup is as follow:

HomeAssistant > MQTT-Broker > Telegraf > Influxdb > Grafana

So EMS connects directly to the MQTT-Broker. Telegraf connects also to MQTT-Broker and fetches the data for writing it into the InfluxDB. Grafana will display the Data.

There was no Change related to the MQTT-Broker or HA itself. Also the Update of EMS was done yesterday, 19:30 in the evening. Also starting at this timeframe the Data was not captured anymore (only relates to boiler_data topic)

So for me its a bit strange. As soon as i reverted to 3.6.4 data was showing up again...

One Thing to note: As i did the update yesterday, it took some long time to restart the EMS. It was stucked at uploading 100% for 5-10 Minutes. So maybe i will give it a try and update to 3.6.5 now again.

Marcwa19197 commented 6 months ago

Tried again updating to the latest stable version. Same issue again, data is not captured anymore:

image
Marcwa19197 commented 6 months ago

And reverting back again... data is shown:

image

Im out of ideas...

proddy commented 6 months ago

and you're 100% sure the MQTT topic boiler_data is present and updated in the MQTT broker? you can use MQTT Explorer to see when it was last written. You can also copy the topic and the json publish and force a manual publish using MQTT Explorer - see if this gets registered in InfluxDB.

Again, logs will tell you what is happening. When I get issues with MQTT I just download the .exe, run it on my PC/laptop with verbose on like

"C:\Program Files\mosquitto\mosquitto.exe" -v -c "C:\Users\paul\Desktop\mosquitto\mosquitto.conf"

so I can see if its working.

Marcwa19197 commented 6 months ago

Yes, see screenshot above: image

Update of MQTT boiler_data/heatblock was around 12:16:55. But as you can see on grafana, no data is displayed in that time:

image

(Screenshot of grafana is done after downgrade toe 3.6.4. As you can see, im getting data without problems. In the gaps, 3.6.5 was running on EMS-ESP....)

Marcwa19197 commented 6 months ago

Also a screenshot directly from HA (where MQTT-Broker is running):

image

No Gaps, also as i watched the MQTT-Broker. I dont know what happens if i update to 3.6.5. But its really strange that a downgrade fixed it... So there are no MQTT-related Changes done in 3.6.5?

proddy commented 6 months ago

There are so many changes between 3.6.4 and 3.6.5 ("444 changed files with 21,600 additions and 18,986 deletions" to be exact) but I can't find anything in the MQTT code that would affect what you're seeing. The fact that EMS-ESP is publishing the data correctly and frequently to the MQTT Broker would suggest there is something wrong with Telegraf not picking it up?

I run InfluxDB and Grafana as separate containers/LXCs on my Proxmox VE server and have HA write to the InfluxDB directly using https://www.home-assistant.io/integrations/influxdb/ - not that it helps you, sorry, but I don't have any missing data. I'm out of ideas too!

Marcwa19197 commented 6 months ago

Yeah, really strange for me...

Telegraf, Grafana and InfluxDB are running on a VM too. HomeAssistant (and MQTT-Broker) is also running on another VM.

Good point, i also configured HA to push every entity stats into InfluxDB (but using another bucket there).

image image

as you can see, bucket which HA writes directly is not a problem, even running 3.6.5.... Checking Telegraf Logs, im able to see the updated data, but its not written in Influx. But why? :-/

image

And this is the exact time of the update i did yesterday evening...

Maybe its time to shutdown Telegraf and use the other bucket directly in Grafana. Maybe its stable then. But it worked since a year now without problems. Will copy my Dashboard and adjust the queries....

zaphood1967 commented 6 months ago

Maybe not related, but since I updated to 3.6.5 most of the ems-entities in HA have changed their name. They have an added "ems-esp.*" now in front of the entity-name, as well as different naming-scheme, which** makes them essentially new entities in HA. Which means as well, that all data colllected is now attached to old entities, so long terms statistics are gone for these entities.

Just one example of the current flow temp (upper one is the old, lower one the new name)

Bildschirmfoto 2024-04-06 um 09 20 35
proddy commented 6 months ago

Maybe not related, but since I updated to 3.6.5 most of the ems-entities in HA have changed their name. They have an added "ems-esp.*" now in front of the entity-name, as well as different naming-scheme, which** makes them essentially new entities in HA. Which means as well, that all data colllected is now attached to old entities, so long terms statistics are gone for these entities.

Just one example of the current flow temp (upper on is the old, lower one the new name) Bildschirmfoto 2024-04-06 um 09 20 35

I don't think they are related, but its a good call-out.

Only the title has changed, not the HA entity name, e.g.

state_class: measurement
unit_of_measurement: °C
device_class: temperature
friendly_name: ems-esp Boiler Set flow temperature

This was done because some users have multiple EMS-ESPs, and it was causing conflicts in HA, so this way, each entity is unique. If you want the old 3.6.1 naming, there's a setting in EMS-ESP called "Entity ID format" as described on https://emsesp.github.io/docs/Configuring/#temperature-sensors

zaphood1967 commented 6 months ago

Ah, good to know, missed that when I did the UpdateAm 06.04.2024 um 10:04 schrieb Proddy @.***>:

Maybe not related, but since I updated to 3.6.5 most of the ems-entities in HA have changed their name. They have an added "ems-esp.*" now in front of the entity-name, as well as different naming-scheme, which** makes them essentially new entities in HA. Which means as well, that all data colllected is now attached to old entities, so long terms statistics are gone for these entities. Just one example of the current flow temp (upper on is the old, lower one the new name)

I don't think they are related, but its a good call-out. Only the title has changed, not the HA entity name, e.g. state_class: measurement unit_of_measurement: °C device_class: temperature friendly_name: ems-esp Boiler Set flow temperature This was done because some users have multiple EMS-ESPs, and it was causing conflicts in HA, so this way, each entity is unique. If you want the old 3.6.1 naming, there's a setting in EMS-ESP called "Entity ID format" as described on https://emsesp.github.io/docs/Configuring/#temperature-sensors

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

zaphood1967 commented 6 months ago

But as you can see on the screenshot, the entity name has indeed changed… otherwise HA would not have identified this as new entities?Am 06.04.2024 um 10:04 schrieb Proddy @.***>:

Maybe not related, but since I updated to 3.6.5 most of the ems-entities in HA have changed their name. They have an added "ems-esp.*" now in front of the entity-name, as well as different naming-scheme, which** makes them essentially new entities in HA. Which means as well, that all data colllected is now attached to old entities, so long terms statistics are gone for these entities. Just one example of the current flow temp (upper on is the old, lower one the new name)

I don't think they are related, but its a good call-out. Only the title has changed, not the HA entity name, e.g. state_class: measurement unit_of_measurement: °C device_class: temperature friendly_name: ems-esp Boiler Set flow temperature This was done because some users have multiple EMS-ESPs, and it was causing conflicts in HA, so this way, each entity is unique. If you want the old 3.6.1 naming, there's a setting in EMS-ESP called "Entity ID format" as described on https://emsesp.github.io/docs/Configuring/#temperature-sensors

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Marcwa19197 commented 6 months ago

I checked it in my HA, everything looks good. So no duplicates... as i can see in your screenshot one of the entities is disabled? (red icon on the right side?)

Marcwa19197 commented 6 months ago

For now im still using the old EMS-ESP Version 3.6.4 because i need some time to adjust the complete grafana dashboard...

Also one point if i change the Queries, the legend looks like:

image

Im not sure how to fix this... but thats an Grafana issue...

zaphood1967 commented 6 months ago

Since it is no longer being utilized by the integration, HA sets it to that state. Am 06.04.2024 um 11:28 schrieb Marcwa19197 @.***>: I checked it in my HA, everything looks good. So no duplicates... as i can see in your screenshot one of the entities is disabled? (red icon on the right side?)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

zaphood1967 commented 6 months ago

Whatever I am choosing as formant the entity keeps the new, longer entity name… what am I doing wrong?

Edit: Seems the old config was "single topic" + "MQTT name", not "Long Name". Now the topics are back to what they have been. Thanks for the hint and sorry to hijack the thread.

Am 06.04.2024 um 10:04 schrieb Proddy @.***>:

Maybe not related, but since I updated to 3.6.5 most of the ems-entities in HA have changed their name. They have an added "ems-esp.*" now in front of the entity-name, as well as different naming-scheme, which** makes them essentially new entities in HA. Which means as well, that all data colllected is now attached to old entities, so long terms statistics are gone for these entities.

Just one example of the current flow temp (upper on is the old, lower one the new name) https://private-user-images.githubusercontent.com/25442184/320189502-622cfb6d-b93d-43e4-8fd9-4ed075b36d44.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIzOTA1MjgsIm5iZiI6MTcxMjM5MDIyOCwicGF0aCI6Ii8yNTQ0MjE4NC8zMjAxODk1MDItNjIyY2ZiNmQtYjkzZC00M2U0LThmZDktNGVkMDc1YjM2ZDQ0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDA2VDA3NTcwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBmOGU4MDM0NzJkOTZhYmRiNDVhMjM0ZjkzMDU0YjNjMzdmMTllNjQwNjQ2YjRjZTY4ODg5ZTQ3NTAxN2FhOTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.iHTuxupPaZ4CacUHI1xU4v6hM1-wUAsm1lR51tuuBd4 I don't think they are related, but its a good call-out.

Only the title has changed, not the HA entity name, e.g.

state_class: measurement unit_of_measurement: °C device_class: temperature friendly_name: ems-esp Boiler Set flow temperature This was done because some users have multiple EMS-ESPs, and it was causing conflicts in HA, so this way, each entity is unique. If you want the old 3.6.1 naming, there's a setting in EMS-ESP called "Entity ID format" as described on https://emsesp.github.io/docs/Configuring/#temperature-sensors

— Reply to this email directly, view it on GitHub https://github.com/emsesp/EMS-ESP32/issues/1684#issuecomment-2041010697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGCDPCGPFT62GKJOHGAWIOTY36UBLAVCNFSM6AAAAABFSYP4GCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGAYTANRZG4. You are receiving this because you commented.

proddy commented 6 months ago

I'll create a new issue and close this one, as we're mixing up topics.

Marcwa19197 commented 5 months ago

Hi, sorry to reopen.

i just switched one of my Grafana Dashboards to use the homeassistant bucket. So data is written into influx without telegraf - directly via InfluxDB Integration in HomeAssistant.

Im still using the older Version, 3.6.4.

But as i can see - there are also some "data gaps" (but at least data is recorded) did i maybe have something misconfigured on ems-esp side (Maybe MQTT Settings or so?)

image

here you can download the support-information from ems-esp: emsesp_info-2.json

And all settings (password emsesp_settings-3.json / usernames not included)

proddy commented 5 months ago

no problem re-opening the issue. Sorry to hear you still having the data loss issue. Your EMS-ESP settings look fine. The breaks seem frequent but random. As we did last time I would check if the data is in HA first (check the history for the last update on the sensor) and also use MQTTExplorer. If the data is there and not in influxdb then there is something filtering it.

Marcwa19197 commented 5 months ago
image

Data in HA seems stable (Outdoortemperature in this example)

Marcwa19197 commented 5 months ago
image

Also using InfluxDB Admin-Webinterface the data looks good...

on Grafana Side my query looks like:

  from(bucket: "homeassistant")
  |> range(start: v.timeRangeStart, stop:v.timeRangeStop)
  |> filter(fn: (r) =>
    r.entity_id == "boiler_outside_temperature" and
    r._field == "value"
  )
  |>drop(columns: ["host", "topic"])
  |>map(fn: (r) => ({ r with _field: "Außentemperatur NW"}))
  |> aggregateWindow(every: 5m, fn: mean)

@proddy : Do you also use grafana?

proddy commented 5 months ago

I stopped using Grafana, but I just re-enabled it (I'm running as LXCs on Proxmox). I can see data coming into InfluxDB from HA and will monitor for gaps. I still need to setup Grafana to use Flux so I can simulate what you're seeing.

edit : I have it running like you have, same settings. Now I'll monitor for 24hrs and see what happens.

Marcwa19197 commented 5 months ago

Im not really sure were the issue is. I guess its in Grafana, but i dont know. Dashboard "feeded" from Telegraf is working well. Using EMS-ESP 3.6.4.

Dashboard "feeded" from MQTT > HomeAssistant > InfluxDB Integration not. Using EMS-ESP 3.6.4. Dashboard "feeded" from MQTT > Telegraf > InfluxDB using EMS-ESP 3.6.5 does not record Thermostat_data at all. Dashboard "feeded" from MQTT > Telegraf > InfluxDB using EMS-ESP 3.6.4 is working well....

Im clueless. Maybe i have to dig in deeper, but maybe you will see the same issue.... One note:

Or one idea: Should i maybe try to enable "retain" flag? Maybe data is dismissed or so? :-/

proddy commented 5 months ago

I checked mt setup and I see no data loss between MQTT,HA,Influxdb and Grafana is showing the data correctly. You said the data looked good in InfluxDB and now you're saying

"Dashboard "feeded" from MQTT > HomeAssistant > InfluxDB Integration not. Using EMS-ESP 3.6.4.

So I'm confused what is working and what isn't on your side.

Marcwa19197 commented 5 months ago

Yes, sorry, im also confused :D

So the following setup is working: Telegraf > Influx > Grafana using 3.6.4

Not working: Telegraf > Influx > Grafana using 3.6.5 (no data for thermostat at all) HA > Influx > Grafana using 3.6.4 (random data gaps)

Not completely tested: HA > Influx > Grafana using 3.6.5

proddy commented 5 months ago

What I would try is 1) In HA, Pick one sensor, say sensor.boiler_boiltemp go to History and confirm there are no data breaks:

image

2) Then go to InfluxDB and check the same thing using the Data Explorer and the HA bucket you created:

image

See if 1 & 2 above match.

I'm using InfluxDB v2 and my HA configuration.yaml contains:

influxdb:
  host: influxdb
  port: 8086
  ssl: false
  api_version: 2
  token: Bijwpm5GGGr1327pXTL_aOadTExRlMbzJVdBg3xsMRvKeWZROMpMmcuwhdOGbVji-olYD2GbGkiBDZNc-YWOgQ==
  organization: b65291260f23aa21
  bucket: home_assistant
  tags:
    source: HA
  tags_attributes:
    - friendly_name
  default_measurement: units
  exclude:
    entities:
      - zone.home
    domains:
      - persistent_notification
      - person
  include:
    domains:
      - sensor
      - binary_sensor
    entities:
      - sensor.system_freemem
      - sensor.thermostat_hc1_currtemp
Marcwa19197 commented 5 months ago
image image

My HA Config:

influxdb:
  api_version: 2
  ssl: false
  host: 192.168.10.111
  port: 8086
  token: NpMt75YWGDkqGZMybhbUvJF6dqyEpbtDY8sTnf0FeAuH4FxdfHKGR3txL8_ZSJcMhX_9fD9ItuE-u7kiUxOz_A==
  organization: default
  bucket: homeassistant

in Grafana is the Entity looks like:

image

Query is:

  from(bucket: "homeassistant")
  |> range(start: v.timeRangeStart, stop:v.timeRangeStop)
  |> filter(fn: (r) =>
    r.entity_id == "boiler_outside_temperature" and
    r._field == "value"
  )
  |>drop(columns: ["host", "topic"])
  |>map(fn: (r) => ({ r with _field: "Außentemperatur NW"}))
  |> aggregateWindow(every: 5m, fn: mean)
proddy commented 5 months ago

So it's a Grafana issue

image
Marcwa19197 commented 5 months ago

Thats it! Thanks!!!

But strange thing... my "old" Grafana Dashbord which is using the other bucket filled from Telegraf does not have this option enabled...

proddy commented 5 months ago

I'll close it, please re-open if there are more issues