briancmpbll / home_assistant_custom_envoy

171 stars 76 forks source link

Bogus data is collected when gateways go offline/are rebooted, or production CT is disabled #153

Open madbrain76 opened 10 months ago

madbrain76 commented 10 months ago

I had my installer do some work on my system on Tuesday. During that time, I flipped most of the breakers, including those for all 3 gateways were reset. Wifi also went out. My HAOS is on UPS and kept running. The HA Energy report for the day showed the following :

image

The data for gateway ending 7059 (IQ Combiner 4) is bogus, as is the data for gateway ending 1048 (IQ Envoy) .The data for the gateway ending 6209 (Envoy-S Standard) is correct.

I'm wondering two things : 1) is there a way to manually correct the bogus data for that day ? 2) could the integration be made more resilient, and not report this bogus data when gateways are rebooted and/or go offline temporarily?

catsmanac commented 10 months ago

Hi @madbrain76, looks like envoy is still able to share surprises.

  1. Not through the HA screens as far as I know, but HA community may know.
    • Today's energy production is a number that builds during the day. In this case when Envoy came back, was it at this number and did it stay on that number until reset for next day, or was it a Spike and get a daily number afterwards again?
    • Did lifetime production also show this behavior?
  2. That probably depends on clear definition of the situation to deal with as well as limitations posed to integrations.
    • These have no access to historical data so this can become complex pretty quickly.
    • Daily and 7 day numbers are a bit prone to special cases in Envoy. Looking at the the Energy picture I wonder if you might use the lifetime energy values to simply avoid this situation, depending on the answer under 1.
madbrain76 commented 10 months ago

@catsmanac Looks like my entire HA DB history was wiped again last night at 4:11am , sigh. I'm trying to restore a nightly backup. I haven't had much luck with the HA backup/restore function in the past when it comes to restoring history database. It usually corrupts itself again shortly after retore, sigh. If my backup actually restores, I'll answer your questions about whether the Lifetime value spiked or not that day the installer was around. I'll consider using the Lifetime data. The reason I'm using "Today's" was that until recently, I had the Envoy LCD (-R), and that was the only option that worked. On the R the lifetime running total was in MWh, without a decimal point, which was not useful for measuring daily production. My R was switched to an S Standard, though, so perhaps now that can work for it too. On the two IQ gateway, I believe the Lifetime can work. I was just using Today on all 3 for consistency.

madbrain76 commented 10 months ago

Edit: Lifetime didn't spike no the day in question, so I should be able to use it and hopefullly get around this problem.

Edit2: it does, here is the data for the same day as above on the spike date, tuesday : image

I didn't change the Envoy ending 7212 since that's the old Envoy-R, and I still have the old data for it in the DB, and Lifetime is no good for the R.

madbrain76 commented 9 months ago

@catsmanac , I debugged my Envoy problem (https://github.com/briancmpbll/home_assistant_custom_envoy/issues/122) to a production CT issue . I disabled the production CT for it in Enlighten on Monday in the middle of the day, and then restarted the Envoy integration. It started metering correctly after that, by getting watt and kWh (energy) data from the micro-inverters.

However, there is a big bump in them middle of the day for the "Lifetime energy production" counter on monday .

image

As you can see, there is a peak "production" of 228.93 kWh for the Envoy ending 1048 between 15:00 and 16:00 . This is of course bogus.

The number comes from the Lifetime Energy Production sensor.

image

When the production CT was turned off, and the integration reloaded, the sensor became non-monotous, ie. the lifetime value decreased.

I realized this the way the value in the Envoy wraps around when the production CT is disabled, but this results in bad behavior as I'm seeing. I'm not sure how to fix my statistics given the single sensor in Home Assistant. Perhaps the integration should expose the production sensor with a different name when it comes from the CT as when it comes from the micro-inverters ? This would make more sense than having two sets of unrelated values in one sensor, as I have now. Ie. either "Envoy XXX Lifetime Energy Production CT" or "Envoy XXX Lifetime Energy Production micro-inverters" . In fact, if the Production CT is enabled, it would make sense for the integration to expose both meters as sensors. If production CT is disabled, of course only the micro-inverters meter can be exposed. As things stand, anyone who switches the production CT off is going to run into this problem with the HA Envoy integration.

catsmanac commented 9 months ago

Hi @madbrain76, it's pretty clear that the integration has no real logic for handling changing nature of an Envoy. Looking back in it's history it was build step by step handling ever new types of Envoy and firmwares but no real provisions to handle the case where the mode of the Envoy changes and Envoy has a glitch in the firmware in this new mode. I know that the metered one with CT disabled resets lifetime every ~1.2Mw, but haven't seen reports how that shows up in the energy dashboard, creating a similar peak or not.

Technically it's relatively easy to add a configuration option enable these 2 'raw lifetime measurements' but these can't replace the current one sourced from either based on envoy configuration as it would break long term history for everybody.

But with these 2 new sensors available, how to handle above change? You can't switch from one to the other in the Energy dashboard since it runs from statistics, so it would switch to history of the raw tags that wasn't correct before and the other way around. You would need to use some HA calculation you can switch between the 2 and use that as input to the Energy dashboard. And as that will be new loose prior history. SO I'm not sure if these 2 new sensors are the simple answer.

Brings use to statistics used by Energy dashboard, maybe see: statistics correction if that may solve the effect.

Open to any suggestion, but don't have a good feel for best path forward yet.

EDIT> I found this on multisensors, maybe it can help with this case and both raw sensors?

madbrain76 commented 9 months ago

@catsmanac Agree it's not a simple answer, and just adding these 2 sensors by itself won't be sufficient to solve the problem, but I think adding them is still required. I don't think it should be a config option though - I would personally just add them. I agree we shouldn't replace the existing (unqualified "production") sensor value, but yet that's exactly what happened when I disabled the production CT on my Envoy. I think we should try to prevent this from happening if possible - having one sensor switch over from CT to MI or vice-versa.

I think some editing of the statistics database will be required to handle this, but I'm struggling to think of what the steps might be. This definitely needs some brainstorming.

I envision something where I add both the CT and MI sensors to the Energy dashboard. The MI sensor would always be exposed. The CT sensor would only be exposed when it is enabled. If there is a switchover (CT enabled or disabled) the integration can detect the time at which this happens. And it can then truncate the stats for both sensors before/after that switchover point, so there is no data overlap between the two for a given time. I know this is pretty complicated, but I believe it could work.

madbrain76 commented 9 months ago

@catsmanac , I tried the statistics correction, but that won't work in this case because it's not a single value that needs to be corrected, but rather a whole stream of values. It seems the HA Energy dashboard expected a monotically increasing counter (eg. for Lifetime Energy sensors) It also allows for it to reset to 0 (for 1-day counters such as Today's energy). It's unfortunate that the MI counter in the Envoy resets at 1.2 MWh. An alternate way to handle this is to. 1) always record the MI sensor data. Never edit its raw data 2) record the CT sensor data if enabled. Never edit its raw data 3) The integration would need to poll the Envoy periodically to detect the CT state change. Currently it does not, and requires manual reloading. 4) create a third "dynamic" sensor that is always monotonically increasing and never resets to 0, built from the data of the above 2 sensors. It could handle CT enabled/disabled state switch. As long as it can detect the time t of the switch over, it can use the data from the one sensor for the "left" side of that time, and the data from the other sensor for the "right" side. I admit this is pretty complicated, especially if one switches the CT state multiple times. At least in theory, it should work, though.

catsmanac commented 9 months ago

@madbrain76 there are some limitations to an integration we have to deal with.

With v0.0.18 the state is checked 1/hour. That can be changed (custom setting or always) to every scan, so change would be instantaneous. A CT mode entity could be created to be used in HA as well as these raw entities (custom setting or always, hidden by default or not) . But these would not solve the issue of value differences between the 2 sources.

To solve that you need either to process history and feed the dynamic sensor from values changes from either source or you need to calculate difference between current and previous value of each source and use the difference of the active one to add to the running value of the dynamic sensor. The key is you need to detect value changes in the sources to prevent the switch tp the other mode to cause the spike up or down.

For using history it requires almost a second custom integration, reading history through the HA api and calculate the resulting values as I haven't found a way to access HA history from inside HA automation.

With calculating differences there may be an issue over HA restarts as the status becomes unavailable and I'm not sure what the value would be, if last known value then it's probably ok.

Such calculations can be done in HA itself by building a custom sensor, using the CT Mode entity and the difference between current and previous value.

Long story short, I think you need to use value change of the source lifetime entity and add that to the dynamic new entity. In that way switching may work flawless, but HA outage may cause some issues.

catsmanac commented 9 months ago

Hi @madbrain76, I build a little envoy simulator that can run in CT connected or not connected mode to test this behavior. To finalize it, I could use some data to validate how the envoy sim is behaving. What I'm after is 2 or 3 extracts from the diagnostics file, each taken a bunch of hours separated, so like one at 10 am, one at 4 pm and one next morning again. I'm looking for the 4 endpoint lines in the diagnostics file shown below in below abbreviated example. I 'm looking for the 4 full lines from each sample.

"Endpoint-meters": "[{\"eid\":\"123456001\",\"state\":\"enabled\",\"measurementType\":\"production\",
"Endpoint-meters-reports": "[{\"createdAt\":1695145300,\"reportType\":\"production\",\"cumulative\":{
"Endpoint-production_json": "{\"production\":[{\"type\":\"inverters\",\"activeCount\":5,\"readingTime
"Endpoint-production_v1": null,

If you have diagnostic files around from before the CT switch then the same extract from those are interesting as well. Trying to observe how the numbers of the non active mode behave.

madbrain76 commented 9 months ago

Which diagnostic file are we talking about, and where is it located ?

catsmanac commented 9 months ago

Go to the envoy device page (settings, devices & services, devices) and select the Envoy device. There you find a download diagnostics button