luckylinux commented 6 months ago

The System has been running quite well since approximatively 3 weeks now.

However today, out of the blue, the Inverter Tripped due to BMS Communication Failure (BMS-Err_Stop enabled on Deye Inverter).

Looking at Home Assistant Dashboard the main thing that caught my eye is this:

14:24:44 is the time where the Issue occurred.

Looking at the logbook there are plenty of Entries like this every day (yesterday was 22 times, so basically 11 couples of brief moments on/off for a few seconds).

Not sure why it happened.

Also weird is that the ESP32 (both of them, both the one of the "dumb" battery and the one connected to the Deye Inverter) seem to have rebooted, if we can trust the Diagnostic Data.

ESP32 not Connected to any Inverter:

ESP32 Connected to the Deye Inverter:

This should not happen, since both ESP32s and the Rock 5B SBC that runs Home Assistant, MQTT etc, are connected to a 230VAC UPS. The Rock 5B SBC didn't reboot:

18:10:11 up 20 days,  1:12,  1 user,  load average: 0.80, 1.01, 1.01

EDIT 1: Adding Uptime Sensor evolution when the Issue Occurred (for the Battery that is connected to the Deye Inverter only)

Exact Value seems to be 1'710'511 [seconds], which translates into 475.141944444 hours or 19.7975810185 days.

Not sure if it's a specific Event which is time-based (cannot remember if I read issues about ESP32 resetting themselves every 3 weeks / 21 days or so), or it was just the right combo of glitch in timing with respect to when the Deye "checks" that the CANbus Communication is actually working.

luckylinux commented 6 months ago

I'm also struggling with the ESP32 just rebooting/crashing/freezing thus causing an Inverter Trip, whenever I reboot the WiFi.

At first I thought disabling the fallback ap and the captive_portal could have fixed this (according to https://github.com/esphome/issues/issues/1679 reboot_timeout is ignored in case where the fallback ap is enabled).

Today I rebooted the WiFi again after applying those fixes and ... ESP32 Crashed/Rebooted/Freezed again and of course the Inverter Tripped.

@Sleeper85 , @MrPabloUK: Is there some timing Issue connected to WiFi loss, whereby the code automatically triggers a Restart ?

Configuration File (finally SNTP Time Sync and Home Assistant Time Sync work correctly by the way): https://github.com/luckylinux/jk-bms-build-helpers/blob/main/esphome-jk-bms-can/esp32-ble-1.17.5.yaml

luckylinux commented 6 months ago

This Commit seems to Fix the issue related to WiFi AP (and/or Home Assistant Server and/or MQTT Server and/or ... whatever):

https://github.com/luckylinux/jk-bms-build-helpers/commit/b60d922b7966af904dc10909634094d517847a9d

Other Remarks

After testing stopping each Service Individually, I took it one step further and stopped them ALL one by one. Still no Trip / Watchdog event.

Issued 5 Reboots (from a Normal Operating State) now and nothing happened again.

It's now Working [for now] as you (and I) would expect ...

NOT sure in the end what really helped:

Setting MQTT keepalive: 5s instead of default (15s)
Setting Watchdog Timeout 30s instead of default (1s ???)
Setting WiFi: power_save_mode: light and fast_connect: true

I also had a configuration issue in /etc/dhcpcd.conf with a Typo on an Interface Name (NOT related to either WLAN or LAN). Furthermore there was a configuration issue in my "Headless" Management Script /usr/local/sbin/check-network.sh which tried to bring up/down the wrong (non-existing) Interface Name to try to ping the LAN Gateway

#!/bin/bash

gateway="192.168.1.1"
interface="eth0"

ping -c4 $gateway > /dev/null

if [ $? != 0 ]
then
  echo "No network connection, restarting $interface"
  /sbin/ifdown "$interface"
  sleep 5
  /sbin/ifup --force "$interface"
fi

Sleeper85 commented 6 months ago

The CAN bus status will be marked down after 20 loops (a different CAN ID is sent per loop) without response from the inverter.

This number of IDs is different depending on the protocol chosen, with Deye I advise you to take "PYLON 1.2".

Deye only responds after receiving ID 0x356. For me (at home) the links are turned OFF 1s every 2 hours and I don't know why but that doesn't pose a problem. If the link was turned OFF due to non-response from the inverter it will be OFF for 120s so this problem does not come from my code.

After the link is marked down, the code stops sending CAN IDs for 120s before testing again for the presence of an inverter.

interval:
  - interval: 120s
    then:
      - lambda: id(can_ack_counter) = 0; // Reset ACK counter for test inverter ACK

luckylinux commented 6 months ago

In my case I am using PYLON+ Protocol (IIRC that was recommended a while ago by either you or MrPablo). Not sure if there are major Differences though ...

Sleeper85 commented 6 months ago

In my case I am using PYLON+ Protocol (IIRC that was recommended a while ago by either you or MrPablo). Not sure if there are major Differences though ...

There is no point in using "PYLON +" with Deye because additional IDs (0x70, 0x371 and 0x379) are not supported by Deye. Just use the name "PYLON" with the protocol "PYLON 1.2".

luckylinux commented 6 months ago

IF (when :smile:) I'll have another maintenance stop I'll switch the Protocol to Pylon 1.2.

While you are very likely right, I'm a bit scared of the system, given how much susceptible it was with this elusive bug I told you about (watchdog triggering if WiFi/MQTT/HA/... goes down) ...

Not necessarily an issue with the Code, there is probably some interaction going on between the different Components (and I probably have more Sensors enabled for tuning/troubleshooting than you do, so more RAM used, etc).

The "fix" (workaround) as I said seems to increase the Watchdog Timeout to 30s (increasing to 10s improved the situation, but did NOT solve it), possibly combined with some of the other stuff I did (although since it was a Watchdog triggering a reboot, this is probably the solution: increasing the Watchdog Timeout). Why the ESP32 would "Freeze" / Hang and then trigger the Watchdog in the first Place, as I told you, I was not really able to diagnose.

Debug logs via USB showed everything normal then ... Watchdog Triggered ... Rebooting.

I'm more of the Attitude right now ... "If it works, don't touch it" :laughing: .

PS: maybe add a small note somewhere (or at least keep it in the back of your mind): if you have reboot_timeout set to 0s or say 24h for both api, mqtt and wifi, then connect the ESP32 via usb and set logging to DEBUG level. Most likely this is the ESP32 freezing/hanging and the Watchdog triggering a Reboot, thus tripping the Inverter due to lack of BMS Communication.

https://github.com/luckylinux/jk-bms-build-helpers/commit/b60d922b7966af904dc10909634094d517847a9d

Again, not saying there is an issue with the Code, this is probably an Edge Case for some Reason ... but it was driving me crazy !

Workaround:

    sdkconfig_options:
      CONFIG_ESP_TASK_WDT: y
      CONFIG_ESP_TASK_WDT_TIMEOUT_S: "30"
      CONFIG_BT_BLE_42_FEATURES_SUPPORTED: y

I did NOT test this but maybe it could also help, at least to some extent (MQTT Options):

Sleeper85 commented 4 months ago

I also notice at home that the CANBUS Status goes from ON to OFF then ON in a few seconds every 2 hours, 4 hours, or more...

I have just reprogrammed the way to manage this status, the counter has been replaced by a 3s timer (by default).

I'll see how it goes in the next few hours.

The two statuses to analyze are these:

cinusik commented 4 months ago

I also notice at home that the CANBUS Status goes from ON to OFF then ON in a few seconds every 2 hours, 4 hours, or more...

I have just reprogrammed the way to manage this status, the counter has been replaced by a 30s timer (by default).

I'll see how it goes in the next few hours.

The two statuses to analyze are these:

@Sleeper85 I do not observe such behavior on 1.4.1 - it is running since 4 days without any problems. Do not see CANBUS going OFF and ON in HA history.

Sleeper85 commented 4 months ago

@cinusik

I do not observe such behavior on 1.4.1 - it is running since 4 days without any problems. Do not see CANBUS going OFF and ON in HA history.

So it may be linked to the brand of the inverter, we have a Deye @luckylinux and I myself.

Sleeper85 commented 4 months ago

This may have something to do with the Deye Time Of Use. To investigate.

canbus_status_off_on deye_tou

luckylinux commented 4 months ago

@Sleeper85 Weird ... I probably have similar Settings but since I am not Grid Connected those are not enabled. Well neither are they enabled in your Case from what I see ...

But I guess you are right, there is definitively a correlation there, although I'd still say that only covers HALF the CANBUS Status Logs.

In my Case: 01h00, 05h00, 09h00, 13h00, 17h00, 21h00 ARE falling on the Time of Use Settings.

HOWEVER: 03h00, 07h00, 11h00, 23h00 are NOT falling on those times I would say ...

To me it looks more like a Periodic 2h (7200s) Thing. Maybe DHCP Related ? Inverter running some internal Calculations (e.g. calculating Energy Integrals, building Statistics, etc) ?

Sleeper85 commented 4 months ago

So several people with other inverters than Deye tell me they don't have this problem. This therefore seems linked to Deye and to an internal Deye timer which at a given moment does not send the canbus ACK within the allotted time of +/- 3s with the default parameters (30 loop of 100ms).

luckylinux commented 4 months ago

Can you share the Version Details of your Deye ? I never upgraded my Firmware so I am still on some Version from 2022 IIRC. I will probably upgrade soon, since there seems to be some benefits related to e.g. less Overcurrent Trips when doing a "blackstart" with Parallel Inverters, Fan ramping up/down, etc.

(That's what I was told at least, somewhere around this Thread: https://diysolarforum.com/threads/bonding-multiple-deye-12kw-inverters-in-off-grid-mode.70969/page-4#post-1071243)

Sleeper85 commented 4 months ago

Last upgrade a year ago, I contacted Deye support this morning to find out the current version and the changelog.

luckylinux commented 4 months ago

Uhm alright. According to the User Group where I bought it (somebody put all Firmware files on Dropbox :smile:) these seems to be the latest for the 12kW Model:

MCU1-3PLV-Ver1001-C037-CK.bin
MCU2_SG12KLP3-V200x-1147-0614.bin

No Changelog info though.

Sleeper85 commented 4 months ago

Uhm alright. According to the User Group where I bought it (somebody put all Firmware files on Dropbox 😄) these seems to be the latest for the 12kW Model:
MCU1-3PLV-Ver1001-C037-CK.bin
MCU2_SG12KLP3-V200x-1147-0614.bin
No Changelog info though.

Do you also find the .bin for my model?

Deye never wanted to send them to me directly.

luckylinux commented 4 months ago

It's only for the 12kW at the link I know ... Let me check if there is another one.

luckylinux commented 4 months ago

Nope, even though the 6kW is currently being sold, the Group Guide is only (right now at least) pointing to a Firmware link for the 12kW. Or at least I think so, I don't know if the firmware is shared between Models (I do NOT think so).

I can try to ask though.

EDIT 1: I just asked on the Facebook Group. I will update you if anything comes back :+1:

EDIT 2: I actually found a Changelog in one of the Posts for the latest Version for 12kW:

Fixed the problem of slow communication speed with ARC-DRM board
Changed the battery internal resistance in voltage mode to 12
Added fan stall detection for SG05, 50% rated power after stall
Added the function of using the meter under selling first
Added reactive power display, registers 710-715
Added CT self-test function (LCD function is not yet ready)
Fixed the problem of no display in ARC-DRM version
Fixed the problem of communication loss of old DRM board
Fixed the problem of easy disconnection of parallel power grid
Fixed the problem of limited charging power when 9 generators are connected in parallel
Added the maximum output power limit register of inverter, register 231
Adjust the power grid power to prevent over-discharge when soc is less than Lowbatt
Added remote mode
Fixed the problem of 20kwPV charging single channel exceeding 200A
Fixed the bug that the off-grid grounding relay does not work when it is disabled in off-grid situation
PV will reduce the rating when charging the battery beyond the rated power
Imported new power grid library

luckylinux commented 4 months ago

Currently used Version, for Reference when I will upgrade later (maybe this Week):

luckylinux commented 4 months ago

Got a Reply: "It is the same firmware file for all models under the SG04LP3 model name. I can just get it confirmed again by Deye."

I asked him to confirm with Deye ... Better be safe than sorry.

Sleeper85 commented 4 months ago

ESP32 flashing with new canbus_link_timer set to 3s (similar to the duration of the old 30 loop counter of 100ms).

Yesterday, from 12:49 to 00:00 no problems and then it starts again at 00:00... I'm going to reflash with a 5s timer to see how it goes.

Sleeper85 commented 4 months ago

With my PVbrain2 development board connected to a fake inverter sending a response exactly every 1s I do not notice this problem.

luckylinux commented 4 months ago

@Sleeper85: I still got no confirmation from the Facebook Group.

I know Comparisons are difficult ... My data is still from the ~ 4 Month old - ish Branch.

Nevertheless I'd say your "Fake Inverter" just works better than the Deye :rofl:.

My Logs seems to show a pretty periodic Issue every 2h / almost 7200s. Although it's true that in Home Assistant I only have "Time of Use - Start" (anyway I do not use it since I am offgrid), so maybe the other "Events" (CANBus ON -> OFF -> ON) correspond to the "Time of Use - End". I'd have to check the Display for that ...

It could be as simple as some Function/Routine/Interrupt in the Deye taking slighltly longer than it should (and a bad management of Priorities on their Behalf - BMS Disconnect should be pretty high on the list compared to an Energy Management Strategy, although of course lower than Protections such as Inverter Overcurrent/Short-Circuit/etc).

I know you can "tune" the Code to "mask" these Warnings (which is of course an Annoyance if 100 People [like me] complain and bring it up :laughing:), yet this is a Problem on the Deye side and they should fix it if they didn't already (and I believe I maybe once had the Inverter restart because of that, although most of the issues were this ultra-difficult-to-replicate Watchdog Wifi/USB BUG that I told you about).

So I don't think you should filter out "too much", because I guess we would still like to have some Troubleshoot Capability. Or would you say that the Deye Error Log would be enough for that (F58: BMS communication fault) ?

luckylinux commented 4 months ago

@Sleeper85: I'm pretty sure some things happen at midnight. For once, all statistics about Solar Production (Daily PV Production, Daily Consumption, Daily Generator Production, Daily Export/Import, etc) are reset. So it could explain a "glitch" at midnight. Not the triggering of another Periodic (e.g. 2h-Period) Issue though. I'd expect a 24h Period for that Glitch (and again, bad management of sync/async Task and their Prioritities on the Deye Firmware Side).

Some other Ideas ... Just brainstorming

Just for my understanding (I'm not so much into it like you are): is it the Deye that sends you a Heartbeat and you "Reply" ("ACK") ? Or is it just you that sends a signal to the Deye in "Open Loop" ?

I'm wondering if there is also maybe some sync/async/interrupt "Issue" in your Code, but I believe that whenever there is some CANbus Data (or Heartbeat ecc), it's like a GPIO triggering an ISR (Interrupt Service Routine), so it's not like you are going to "miss" some of the Deye "sollicitations" because you are not having fully Synchronous Interrupts ?

What I mean is essentially: could it be that one Time your Function/Interrupt takes say 98 ms, sometimes 102 ms, sometimes 121 ms etc, and sometimes that "masks" some other incoming Data / Signal ? Basically kind of the equivalent in Power of a "Low Frequency Oscillation" introduced by a "High Frequency Signal with some small Time/Amplitude Variation or Jitter" ?

Did you try to measure how long your (high priority) ISR actually takes with a Scope (trigger GPIO pin HIGH-LOW-HIGH and measure Time Difference between the Rising and Falling Edges) and is it 100% Repeatable ? Or from time to Time (e.g. you calculate a Division, SQRT, whatever), it takes Longer ?

Sleeper85 commented 4 months ago

You are right, it is probably a problem on Deye's side and if increasing the timer to 5s hides this problem it will nevertheless still be there.

The PYLON canbus protocol stipulates that the inverter sends an ACK can_id 0x305 every 1s. My new canbus_link_timer waits 3s before marking the canbus link down.

I can already tell you that Deye sends this ACK 0x305 only in response to the sending of can_id 0x356 and that it already does not respect the protocol.

luckylinux commented 4 months ago

@Sleeper85: To be fair that Chinese-English Picture says "Inverter reply every second". Is that a Translation Error ?

Because "Reply" would mean that the BMS has to "Ping" (i.e. "Initiate the Communication / Process") the Inverter, NOT the other way around.

Sleeper85 commented 4 months ago

The inverter sends the can_id 0x305 every 1s as soon as it receives data from the BMS. But in reality the Deye only sends it in response to the can_id 0x356 sent by the BMS. The PYLON protocol does not specify any specific can_id to use for this heartbeat.

For my part, it's when this 0x305 frame is received that the 3s timer is reset.

I will try to analyze what I receive from Deye and check if it sends anything other than 0x305.

How do you explain that since yesterday 12:49 until today 00:00 the canbus has not been marked down once ? Then since 00:00 it has been marked down every 2 hours.

So we can say that there are a few things that happen at 00:00, I know that the ESP also resets the BMS Total Daily Energy sensors at midnight.

luckylinux commented 4 months ago

How do you explain that since yesterday 12:49 until today 00:00 the canbus has not been marked down once ? Then since 00:00 it has been marked down every 2 hours.

As I said I cannot :disappointed:. With your Modification, I can explain a ONE-OFF glitch at Midnight (00h00), but NOT what happened at 02h00, 04h00, 06h00, 08h00, ... It feels like whatever happens at Midnight (00h00) throws the entire System "Off-Balance".

One Thing that is surely weird with the Deye is that it has quite a high Time (Clock) Drift. IIRC I had like 20 Minutes Drift over 6 Months or so ... Not sure if this is related in any Ways though. The ESP32 is/might be NTP (or SNTP) Time Synchronized, but I do NOT think you give that information to the Deye, do you ? Then it's probably not related ...

EDIT 1: I'd say that whatever you did when introducing this 3s Setting, anyway required an automatic restart of the Deye (since BMS_Err_Stop was enabled). So restarting the Inverter "resets" the Faulty Behavior. But if gets automatically (wrongly) reset / fallback to old Behavior at Midnight for whatever Reason.

EDIT 2: can you try to "Ping" the Inverter with 0x356 more Often ? Instead of 1s try 0.5s maybe that Helps. Not sure if related to the System Clock / Time Drift but probably worth a Try.

EDIT 3: The Errors are all actually very short (0.0s ... 0.5s, it's actually probably around 0s). ONLY the one at Midnight is around 2s (1.5 ... 2.5s). I know it's not very detailed, but is it possible for you to increase the "Resolution" of Entities to Milliseconds Timestamps ? Maybe that Way we can see if it's like 0.49s or 0.01s :+1:

EDIT 4: Forget EDIT 3, UNIX Timestamps are seconds from 1970 ... Can you create an Entity like "CANbus_Down" and as soon as you detect a "down status", you start "counting", then when it gets back working, your "stop" it ? Either a cycle counter or a ms counter, whatever is easy (I'm pretty sure you know how to convert between the 2 anyways)

Sleeper85 commented 4 months ago

I confirm that Deye only sends this 0x305 frame and nothing else. The reception time varies between 1 and 3s.

Knowing that the ESP32 receives BMS data every 3s, this adds a little time in the loop and probably delays the sending of can_id 0x356 and therefore the response 0x305 from the Deye ? I could work on this as well.

If the other inverters respond with can_id 0x305 to each can_id received then the heartbeat is surely shorter.

I have now increased the canbus_link_timer to 5s. Now we probably have to wait until midnight. A little before midnight, I will check again the delay between two 0x305 ACKs.

I don't use the BMS_Err_Stop function currently, in the event of loss of links with the BMS the Deye will use the internal charging parameters that I have correctly defined.

EDIT 1: my Deye's time has remained perfectly synchronized for months (without an NTP server) so no problems for me in that regard.

luckylinux commented 4 months ago

Knowing that the ESP32 receives BMS data every 3s, this adds a little time in the loop and probably delays the sending of can_id 0x356 and therefore the response 0x305 from the Deye ? I could work on this as well.

So you are currently only checking / pinging the CANbus on the Inverter side every 3s or so ? Like ... you only have 1 Loop in the Code ? Maybe Inverter CANbus communication should be done with another ISR/Loop triggered every 0.5s or 1.0s and configured with a higher priority. Battery Communication might be another ISR (I guess you could do that ... we have 2 Cores on the ESP32-S3). Thoughts ?

I don't use the BMS_Err_Stop function currently, in the event of loss of links with the BMS the Deye will use the internal charging parameters that I have correctly defined.

So you are using the Battery in "Voltage Mode" (NOT Lithium) ? I know the Deye Inverter has this Good "Feature" that the BMS Communication works also when the Battery is in Voltage Mode.

Not sure if the Timing Changes (since mine is in Lithium and Warnings every 2h00 then I do NOT think it makes a difference).

Sleeper85 commented 4 months ago

I use Lithium 00 mode.

If the canbus link is up I send a different can_id every 100ms.

  - interval: 100ms
    then:
      - if:
          condition:
            lambda: |-
              if (id(${canbus_id}_send_canbus_data) == true) {
                id(${canbus_id}_msg_counter)++;                            // CANBUS MSG counter ++
                return true;                                               // Condition OK
              }
              else return false;

I have to leave you, I have a team meeting now.

luckylinux commented 4 months ago

No Worries :+1:. Just read my Posts when you have some Time. I'm just trying to give you some Ideas of what could be the Cause ...

Looking at the Code

You are doing:

Reset id(${canbus_id}_ack_counter) every 120s (to prevent being "Stuck" in "OFF" I presume)
Reset id(${canbus_id}_ack_counter) if you receive a Reply within 30 Cycles x 100ms = 3s (to prevent that each small Glitch accumulates over Time)

So those should be Fine.

Looking at your Picture, most "Steps" are 1s apart, some 3s, some 4s, some even 5s !

Those should actually trigger an Error. Why don't they ? You changed to 5s already ?

Question is, who is it at Fault ?

Is it the Deye that is stuck in a low-priority ISR/Function and "Forgets" to reply ?
Is it the ESP32 that is stuck in a low-priority ISR/Function and "Forgets" to send ?

It would be nice if you could print Timestamps with milliseconds, assuming it's possible to do so :+1:.

What I see in the Code (I could be mistaking) and what I see in the Timestamps from Home Assistant you provided seems to suggest the following: a. Periodically, due to some "Glitch", you get id(${canbus_id}_ack_counter) >= ${canbus_max_no_ack} (well =, since it will NOT be increased after that anyways)

b. At some Point after that (impossible to know for sure but definitively < 120s), the reset happens since the Inverter sends an ACK finally:

It would be interesting if you could do the following for Debugging Purposes:

Create a State/Entity "mapped" (assigning the value of) id(${canbus_id}_ack_counter)
Keep increasing id(${canbus_id}_ack_counter) also in the other else if and else Blocks:

Is it possible that the CANBus Interrupts are Lower Priority than say UART/RS485/other stuff going on in your Code ? So basically that you are processing the CANbus signals only after everything else ?

On another Project with CANbus + Vertiv/Emerson R48 3000W Charger, the Issue I ran into was that the Code might fill up/flood the "Buffer" / "Queue" of the CAN Transceiver with Messages and nothing would go through anymore. Only solution would be to physically Reboot (and Power Cycle the USB Adapter). Not sure if something similar could be happening here with the ESP32. And maybe, once it goes to "OFF" State, then something forces a Reset of the CAN Adapter.

Possibly easier way to try "who is at fault": do a MINIMAL code (maybe even with just an USB Adapter if easier for you), just ACK of the Inverter and you sending some dummy Data, nothing else. Do the Timestamps exhibit the same "Pattern" (1s-2s-5s-1s-1s-1s, "all over the place"), or is it more Regular (1s-1s-1s-1s-1s)?

luckylinux commented 4 months ago

@Sleeper85: Got a Reply (and Confirmation) from the Facebook Group:

**They confirm that it is the same file for all models in the SG04LP3 series**

Sleeper85 commented 4 months ago

@luckylinux

The code you show now is completely different in YamBMS 1.4.2.

I just added a sensor with the ACK interval in ms.

Sleeper85 commented 4 months ago

Here is the new code since YamBMS 1.4.2 and this did not prevent having the same problem every 2 hours since midnight. I just added the new ACK interval sensor.

# +--------------------------------------+
# | Scripts                              |
# +--------------------------------------+
script:
  - id: ${canbus_id}_script_canbus_link_timer
    mode: restart
    then:
      - delay: ${canbus_link_timer}s
      - lambda: |-
          id(${canbus_id}_send_canbus_data) = false;                      // Stop sending CANBUS datas
          id(${canbus_id}_status).publish_state(false);                   // Set CANBUS Status to OFF

# +--------------------------------------+
# | Start CAN Handling                   |
# +--------------------------------------+

canbus:
  # 0x305 - Inverter ACK - SMA/LG/Pylon/Goodwe reply
  - id: !extend ${canbus_node_id}
    on_frame:
    - can_id: 0x305
      then:
        - light.toggle:
            id: ${canbus_light_id}
        - lambda: |-
            id(${canbus_id}_send_canbus_data) = true;                      // Continue sending CANBUS datas
            id(${canbus_id}_status).publish_state(true);                   // Set CANBUS Status to ON
            id(${canbus_id}_script_canbus_link_timer).execute();           // Restart CANBUS link timer

            // ACK interval
            int current_ack_ms = millis();
            int ack_interval_ms = (current_ack_ms - id(${canbus_id}_last_ack_ms));
            id(${canbus_id}_last_ack_ms) = current_ack_ms;
            id(${canbus_id}_ack_interval_ms).publish_state(ack_interval_ms);
            ESP_LOGI("canbus", "can_id 0x305 received - ack_interval %d ms", ack_interval_ms);

interval:
  # Test inverter 0x305 ACK
  - interval: 120s
    then:
      - lambda: |-
          if (id(${canbus_id}_send_canbus_data) == false) {
            id(${canbus_id}_send_canbus_data) = true;                      // Start sending CANBUS datas
            id(${canbus_id}_script_canbus_link_timer).execute();           // Restart CANBUS link timer
          }

  - interval: 100ms
    then:
      - if:
          condition:
            lambda: |-
              if (id(${canbus_id}_send_canbus_data) == true) {
                id(${canbus_id}_msg_counter)++;                            // CANBUS MSG counter ++
                return true;                                               // Condition OK
              }
              else return false;

          then:

Sleeper85 commented 4 months ago

Wait and see what happens at midnight.

Maybe I should name the sensor Inverter heartbeat?

luckylinux commented 4 months ago

So you just flashed with 5s "timeout" now ? It doesn't look so bad ... Compare that with your previous Picture where it could be up to 4s-5s (between 2 consecutive ACK) ...

Is this with the "Full Code" or just the "Minimal Example" to try to isolate whether the Issue is on Deye or ESP32 Side ?

EDIT 1: About the name ... Yeah, you can rename it to Inverter heartbeat, although that is a small Details IMHO. Inverter ACK Interval is pretty clear too :smile:.

Sleeper85 commented 4 months ago

Yes I flashed the complete code with a canbus_link_timer of 5s.

The interval has just increased to 2.91s, therefore close to the 3s of the first test.

Sleeper85 commented 4 months ago

Donc tu viens de flasher avec un "timeout" de 5 secondes maintenant ? Cela n'a pas l'air si mal... Comparez cela avec votre image précédente où cela pourrait aller jusqu'à 4s-5s (entre 2 ACK consécutifs)...

This shows that we cannot trust the timestamp of the logs.

luckylinux commented 4 months ago

Donc tu viens de flasher avec un "timeout" de 5 secondes maintenant ? Cela n'a pas l'air si mal... Comparez cela avec votre image précédente où cela pourrait aller jusqu'à 4s-5s (entre 2 ACK consécutifs)...

This shows that we cannot trust the timestamp of the logs.

Is GitHub autotranslating all messages :open_mouth: ?

Si tu veux on peut aussi bien parler Français :laughing:

luckylinux commented 4 months ago

The interval has just increased to 2.91s, therefore close to the 3s of the first test.

And it's close to the 2h - "integer" Period (12h00, 14h00, 16h00, ...).

I would have thought the Deye would compute hour-by-hour Statistics, NOT bi-hourly (every 2 Hours).

Are you (or the ESP32 by itself :laughing:) doing something on the ESP32 side every 2 Hours ? Maybe something to do with DHCP / NTP / SNTP Sync ?

luckylinux commented 4 months ago

@Sleeper85: can you provide a Screenshot of your "Time of Use Display" ?

Maybe a "Workaround" (if you do NOT use Time of Use) could be to set every Time of Use Start/Stop to "00:00". And see if the Issue (or "Feature") disappears ...

EDIT 1: I set all my "Time of Use Start" (Remotely) to 00:00. "Time of Use End" for some Reason wasn't available ...

Sleeper85 commented 4 months ago

Are you (or the ESP32 by itself 😆) doing something on the ESP32 side every 2 Hours ? Maybe something to do with DHCP / NTP / SNTP Sync ?

Good question, I don't know.

I'm testing it with the PVbrain2 board and a fake inverter which sends an ACK 0x305 only in response to the BMS can_id 0x356 as is the case with Deye.

canbus_link_timer is set to 3s in order to see if disconnections happen or not?

Wait and see

luckylinux commented 4 months ago

In my case changing the Time of Use settings doesn't seem to affect it (done Remotely, didn't Test it, but setting seem to be kept at the set value of 00:00). I got another error just now at 15h03.

Strange though that you get "even" Numbers while I get "odd" Numbers, eh ?

Sleeper85 commented 4 months ago

It's 16:00, time to take a little 3.10s break :smile:

This confirms that this behavior comes from the Deye inverter and not from the ESP32.

luckylinux commented 4 months ago

"Confirmation", meaning the difference between the Real Deye Inverter and the Fake Deye Inverter (your "Simulator") ?

luckylinux commented 4 months ago

I think you might need to report a BUG to Deye to be honest. Unless I need to flash my Firmware and update my Inverters first to test :roll_eyes: ...

Sleeper85 commented 4 months ago

Je pense que vous devrez peut-être signaler un BUG à Deye pour être honnête. À moins que je doive d'abord flasher mon Firmware et mettre à jour mes onduleurs pour tester 🙄...

This will be much too complicated to explain to them and as you said, from 00:00 and every 2 hours Deye carries out a task which slightly delays the sending of ACK 0x305. So in theory with the canbus_link_timer parameter set to 5s the links should be stable.

luckylinux commented 4 months ago

Well you can argue it's just a false alarm and could be filtered.

From a "tripline" point of view, it's anyway the Inverter that will trigger and Error and shut-down (BMS_Err_Stop) in case there is a real Communication Issue. It is the Inverter supervising the BMS, not the other way around !

And that will NOT be affected by 1s, 3s or 5s "Internal" ESP32 (to your Code) CANbus Supervision.

I would leave the "ACK" Entity (~ 1s) in case there is a real issue where the Deye Trips for whatever reason and we need to investigate. Of course if it doesn't use too much Memory or CPU.

But yeah, you can "filter" the CANbus status for these false alarms that did NOT trip the Inverter.

luckylinux commented 4 months ago

Maybe put that "ACK" in the "Diagnostic" Section (only for Advanced Uses), if we can keep it :smile:

Sleeper85 / esphome-jk-bms-can

[Deye behavior] Every 2 hours and especially at midnight, the canbus ACK 0x305 is sent by Deye with delay #48

Other Remarks