UtilitechAS / amsreader-firmware

ESP8266 and ESP32 compatible firmware to read, interpret and publish data to MQTT from smart electrical meters, both DLMS and DSMR is supported
Other
387 stars 73 forks source link

Pow-U reboots often, with indication "Reason: Vbat power on reset (1/0)" #627

Closed ArnieO closed 12 months ago

ArnieO commented 1 year ago

Message from the amsleser.no team (@gskjold and myself):

We are opening this issue in order to obtain two things:

  1. Get feedback from other users that might be seeing the same issue. So if you are seeing this issue on your Pow-U ESP32 device: Please leave a comment with details here, or email us at post@amsleser.no. Please post screenshot of your Info- and Config page, including the header that identifies firmware version. Please also indicate power meter brand and model.
  2. Update those affected on the status of our debugging process.

Summary We have over the last few weeks received a few notifications on Pow-U devices that have started to reboot often, sometimes several times of day. These reports suddenly started to appear on a product that has not been changed since 2022, and on the same PCBA production batch that we have sold since January 23.

We do not see the issue on our own device, so debugging this is a challenge!

We currently have 4 reports related to Aidon meters, 1 report related to a Kaifa/Nuri meter.

The number of reports are few compared to the number of devices sold, but that does not reduce the problem for those affected.

What we have tried so far

Some technical background There is in principle (by design) only one way the ESP32 can report "Vbat power on reset (1/0)", and that is if the voltage has for some reason dropped below approx 2.85V.

The device has a voltage supervisor chip that controls the ENABLE line to the ESP32 module. It is implemented with a hysteresis, so that:

This ensures that the ESP32 never enters the so called Brownout voltage region, which is voltage below 2.8V. If ESP32 reboots because it detected such a low voltage while operating, it will reboot with a different notification, saying it recovered from brownout.

The Pow-U generates its operating voltage (nominal 3.3V) from the M-bus signal. That signal is at 24 V (34 V in mid-European meters) in between datagrams, and varies between 24 (or 34) V and 12V lower during data reception. Moreover, it uses a 1F super capacitor to keep the voltage stable while the ESP32 pulls current pulses during data transmission. This has proven to be a successful and stable design - until the issue described herein suddenly started to appear.

Call for assistance As we are unable to recreate the issue, we kindly ask for assistance in getting closer to the cause of the issue.

Our first step will be to confirm whether the issue is indeed voltage drop as described in previous paragraph. The way we intend to get that sorted out will be to make a Test firmware to replace the current firmware on your device.

The Test firmware will:

The test firmware will be posted by @gskjold in this thread as soon as it is available.

Those who are willing to participate in the test must be familiar on how to upload new firmware to the device via the USB cable. It will probably NOT be possible to go back to normal firmware via OTA one-click upgrade. If this is unfamiliar to you, please do not install the test firmware.

bmork commented 1 year ago

I just installed the version you attached

That wasn't entirely true, but now it is installed.

And I learned something new about powering the Pow-U+. Simply borrowed the USB-C power plug and cable I use to power my laptop for the upgrade. The power supply delivers 3A at all the lower voltages, which obviously should be plenty. But it wasn't. The Pow-U+ rebooted unexpectedly several times while trying to upload the new firmware. I guess the voltage drop in 3m USB-C cable is too much for the Pow-U+ regulator.

Or was the problem using a PD power supply? Maybe that's a bad idea?

Used a 15cm USB-A to -C cable from my laptop instead, like I've done before, and everything was fine.

ArnieO commented 1 year ago

And I learned something new about powering the Pow-U+. Simply borrowed the USB-C power plug and cable I use to power my laptop for the upgrade. The power supply delivers 3A at all the lower voltages, which obviously should be plenty. But it wasn't. The Pow-U+ rebooted unexpectedly several times while trying to upload the new firmware. I guess the voltage drop in 3m USB-C cable is too much for the Pow-U+ regulator.

Or was the problem using a PD power supply? Maybe that's a bad idea?

Thank you for mentioning this, but we never heard of any such issue before.

And even if there is a quite strong inrush current spike (supercap charging), it is well below 3A. The USB-C socket on Pow-U is implemented in full accordance with the standard: https://www.usb.org/document-library/usb-type-cr-cable-and-connector-specification-release-22 It has a "simple" termination to indicate to a PD power that it wants 5V, no advanced voltage or current level negotiation. To be exact: There are two 5k1 resistors, connected to the CC1 and CC2 terminals of the USB-C socket.

I find it very unlikely that cable voltage drop is the issue. Maybe there is some very "nervous" protection circuitry in your computer - that reacted to the initial inrush current?

Anyway: Interesting to know this - in case anyone else runs into the same issue. Thank you!

bmork commented 1 year ago

I find it very unlikely that cable voltage drop is the issue. Maybe there is some very "nervous" protection circuitry in your computer - that reacted to the initial inrush current?

The problem happened when powering from a 65W USB PD supply, not from the laptop. I doubt the supply will limit the current until it reaches 3A. I didn't disconnect the Pow-U+ from the HAN port at all so the supercap was already charged. And the reboot didn't happen until I tried to upload the new firmware, not when I connected the external supply.

Could have been a co-incidence. But I did try several times before switching cable and supply, and then it worked the first time. Guess it could be an issue with the supply. Haven't used it much at 5V, if at all. It's a bit overkill for that. Just happened to be at hand.

gskjold commented 1 year ago

esp8266.zip I have been experimenting a little bit and changed the way 11b is disabled for 8266, thanks for the tip @dbeinder . At least on my setup, it does not drop into b modes now.

bmork commented 1 year ago

The problem happened when powering from a 65W USB PD supply

I've now tested this with two different PD supplies. Both are capable of charging my headset, which I believe supports 5V only too. Used three USB-C to USB-C cables in the process - all known to be working fine. The conclusion is that my Pow-U+ isn't powered by PD supplies. Disconnecting it from the HAN port with the PD supply connected makes the Pow-U+ fall off the network. And the LEDs dim slowly as the supercap is discharged. So that explains the upgrade issue.

External power works fine with a 5V charger or USB port as source, using a USB-A to USB-C cable. But it does not work with PD.

Not sure if this is completely off topic here, or if this is a hardware bug possibly related to the topic?

ArnieO commented 1 year ago

The problem happened when powering from a 65W USB PD supply

I've now tested this with two different PD supplies. Both are capable of charging my headset, which I believe supports 5V only too. Used three USB-C to USB-C cables in the process - all known to be working fine. The conclusion is that my Pow-U+ isn't powered by PD supplies. Disconnecting it from the HAN port with the PD supply connected makes the Pow-U+ fall off the network. And the LEDs dim slowly as the supercap is discharged. So that explains the upgrade issue.

External power works fine with a 5V charger or USB port as source, using a USB-A to USB-C cable. But it does not work with PD.

Not sure if this is completely off topic here, or if this is a hardware bug possibly related to the topic?

Interesting indeed, as this is a potential (but surprising!) HW bug. (@gskjold : Can you help me move this to a separate topic? It seems totally unrelated to the subject here.)

I will re-read the USB Type C documentation and see if I can find a reason for this - and a solution for next layout. Thank you for your assistance, @bmork !

ArnieO commented 1 year ago

I will re-read the USB Type C documentation and see if I can find a reason for this - and a solution for next layout.

I confess I have not re-read the USB type C standard (>370 pages), I have re-read a number of web sites explaining how to do the common stuff.

And what I find everywhere is the same. To indicate to the USB type C power supply that the device is an upstream-facing port (UFP) that needs 5V up to 3A, the solution is what we have implemented: a 5k1 resistor to GND on each of the CC terminals.

I found this page has quite precise information on how this works, and how to implement a device with UFP: https://dubiouscreations.com/2021/04/06/designing-with-usb-c-lessons-learned/

HOWEVER, I now also found this forum thread: https://forum.digikey.com/t/simple-way-to-use-usb-type-c-to-get-5v-at-up-to-3a-15w/7016

See especially the post written by "mvduin", that goes in more detail. And what I understand from this is that a PD source (DFP) will signal how much the device needing power (UFP) how much current it is allowed to draw, by monitoring the voltage on the two CC pulldown resistors. And IF the DFP (source) indicates a lower current level than the device needs, it is the responsibility of the device to either change its consumption OR shut down (disconnect from the source).

@bmork: My understanding is then that your PD power for some reason does not accept delivering the current needed by the Pow-U, and signals that by the current it runs through the CC resistor - which the presently used design of the Pow-U does not act upon. The result is that Pow-U continues to pull as much current as it needs - and as a result probably pulls down the bus voltage to a level where the Pow-U is unable to operate.

What puzzles me is that you say this happened on a 65W PD supply.

@bmork: I apparently have to buy a PD supply to test this myself, but in the mean time: Are you equipped with a multimeter so that you can probe the voltage of the CC resistors, if I indicate to you where to probe? It would be very interesting to see what the voltage is, which could help understanding what is going on here.

Conclusion Our USB type C interface is not fully compliant to the USB standard, as it does not correctly handle PD supplies that signal a lower current supply capability than the device needs. To become fully compliant, an USB Type-C controller chip has to be included, as well probably a switch component (I'll have to study this further). Given the available PCB area on Pow-U, this seems like a significant modification that will not be introduced on the next (soon upcoming) relayout. So until further, we should signal in the User Manual that our products can not be powered from Type-C PD power supplies.

bmork commented 1 year ago

What puzzles me is that you say this happened on a 65W PD supply.

Yup. The other PD supply I tried is 48W. Both specify 3A capability @ 5V.

Are you equipped with a multimeter so that you can probe the voltage of the CC resistors, if I indicate to you where to probe?

Can do. I measured between the USB-C connector shield and the R7/R8 end closest to the connector. Is that correct? This gave 1.62V on R7 and 0 on R8. And vice versa after turning the USB-C 180 degrees.

For the fun of it I tried a few other USB-C power sources. My Android phone powers the Pow-U+ without complaints. A HP laptop with USB-C ports does not. The 100W USB PD battery bank I have was most fun It turns on when I connect the Pow-U+, as it should. But it looks like it is using the Pow-U+ as a power source - the supercap driven LEDs dim much faster after connecting to the battery bank.

(the battery bank should also deliver up to 3A @ 5V)

we should signal in the User Manual that our products can not be powered from Type-C PD power supplies.

FWIW, I don't see this as a problem. The USB-C connector is convenient in any case. And I would never use anything but a dumb 5V supply for permanent powering in any case. Only noticed this issue becasue the PD supply was right next to me and I only needed it for the 1 minute firmware upload. Wouldn't have thought about using a PD supply otherwise.

If the RaspberryPi 5 can use a non-standard USB-C supply, then....

ArnieO commented 1 year ago

I measured between the USB-C connector shield and the R7/R8 end closest to the connector. Is that correct? This gave 1.62V on R7 and 0 on R8. And vice versa after turning the USB-C 180 degrees.

Yes, correct. So this is even more puzzling. According to this table (from the 2nd link in my previous post), the PD power then says you're allowed to draw up to 3A. 7e936e19466c59febf7e33c77ea8d0e1190977f4

What is the voltage on testpoint TP1 when PD power is connected and the unit does not work? This is the voltage delivered from USB. image

bmork commented 1 year ago

What is the voltage on testpoint TP1 when PD power is connected and the unit does not work?

2ish and falling. Which is the same I measure when disconnecting the power supply.

bmork commented 1 year ago

Hey, think I'm onto something. I measured on the USB-C connector pins and see that you leak this voltage out on VBUS. This is probably the issue, hitting some protection mechanism. I tried discharging the supercap completely before connecting the PD supply, and then it worked! With 5V on TP1 obviously.

ArnieO commented 1 year ago

Hey, think I'm onto something. I measured on the USB-C connector pins and see that you leak this voltage out on VBUS. This is probably the issue, hitting some protection mechanism. I tried discharging the supercap completely before connecting the PD supply, and then it worked! With 5V on TP1 obviously.

💡👍 That's correct! Layout v1.7 does miss a zener diode to prevent backwards leakage from the internal Vcc (3,3V) to the USB port. That zener will be there from from next layout v1.8. OK - that is probably the reason then.

bmork commented 10 months ago

FYI: v2.2.24 was way more stable for me than anything I've seen before, with uptime measured in weeks when I updated to v2.2.28 yesterday. That version looks good too, although it's too early to tell for sure of course.

Was the sudden stability improvement expected? Or is it a side effect of other changes? The removal of L2 calculations is high on my list of suspects....

gskjold commented 10 months ago

Was the sudden stability improvement expected? Or is it a side effect of other changes? The removal of L2 calculations is high on my list of suspects....

Thanks for the update! This is an interesting theory, I have not reflected on possible stability improvements from removing the calculation, but maybe there is something to that

montex commented 10 months ago

Hi,

I am also experiencing this issue, with a new Pow-U purchased last week. I initially thought it was a connection problem, but I now realized that it is rebooting multiple times a day, and always with "Vbat power on reset (1/0)". It rarely stays up for more than 30 minutes.

It seems the problem started to appear when I configured MQTT, possibly because of a higher usage of the wifi. The device is in its own wireless network, although it is one of those "extra networks" created by the same router as my main wifi in the apartment.

As a side note, I wanted to uncheck the option "Auto reboot on connection problem", but I cannot find it. Has it been removed?

Details:

Firmware: 2.2.28

Meter Manufacturer: Kaifa Model: MA105H2E

screen-2024-01-15 11-57-23

screen-2024-01-15 11-58-11

ArnieO commented 10 months ago

Hi @montex - thank you very much for reporting your issue!

First of all: Yes, we have removed the "Auto reboot on connection problem" option, alongside some adjustments in the code.

We have a very small number of users still seeing this issue, and it is indeed difficult to debug - both because it is rare and also because in potentially is due to issues in the user Wi-Fi (at least that is currently our leading hypothesis).

So all input and help we can get from those affected will potentially be useful for coming closer to solving it.

Since you use MQTT: Are you logging the device voltage (the one in the leftmost "green light" on your screenshots)? Do you see any anomalies the historic of the voltage? Anything happening around the times the device has rebooted?

montex commented 10 months ago

Hi @ArnieO,

Thank you for your quick reply! Yes, I am using Home Assistant and I have a log of the voltage. I am attaching the screenshots for yesterday and for today. There are definitely some big drops of voltage all around.

The interesting part is that today, since I left home (around 12:30), the voltage seems to be stable. Could it be the WiFi of my mobile phone somehow causing the problem?

Yesterday: yesterday

Today: today

ArnieO commented 10 months ago

I am attaching the screenshots for yesterday and for today. There are definitely some big drops of voltage all around.

Thank you - and wow... this is fluctuating A LOT! It clearly explains the reason for the reboots. There is circuitry on the board that pulls the enable pin of the microcontroller if the voltage drops below approximately 2.9V - to avoid entering the brownout voltage range (where the microcontroller "works" but is no longer reliable). Those events will be seen by the microcontroller as "Vbat power on reset (1/0)".

What happened between 10 and 12 today - before you left home? Also yesterday between 14 and 16 there is a lot of missing measurement points.

Can you tell me a bit more about your Wi-Fi system?

No, I don't think it is plausible that your mobile phone being home / not home could cause what we see here. The phone only talks to your router.

montex commented 9 months ago

So, I think I found the problem in my case, and it was quite unexpected.

What happened between 10 and 12 today - before you left home? Also yesterday between 14 and 16 there is a lot of missing measurement points.

That was just me disconnecting the device and putting it in AP mode. Then waiting for the capacitor to discharge, and in general trying to understand what was going on.

Can you tell me a bit more about your Wi-Fi system?

It is just a basic Telenor WiFi router. I have some heaters with WiFi, but it turned out they are not the problem. But the hint about WiFi was great. I looked into the signal strength history, and this is the plot for today (corresponding to the second picture in the post above):

Schermata del 2024-01-16 00-54-50

I initially blamed to some backup over WiFi I was doing yesterday, but then I ruled also that one out.

It turned out that the device was trapped too close to the metal enclosure of the cabinet! (see pictures below). And metal is very good at blocking WiFi signal, of course.

Today, out of frustration I had left the device in another position, and it became stable! Now I just tried to put it back where it was and the signal instantly dropped again. So that should be added as a recommendation for troubleshooting perhaps.

This was causing the device to reboot: IMG20240116004711

This seems fine (so far) IMG20240116004722

Ah, also MQTT was innocent in the end. Server-side I was seeing some timeouts of the client in the logs, which now make much sense.

@ArnieO Thank you for your help!

ArnieO commented 9 months ago

Ah... this is very interesting feedback!

We should indeed give a clear recommendation on the installation of the unit. If you open it and look at the antenna position, you will se that in that first photo the antenna is pointing down towards the corner and floor of the cabinet. Which is of course far from optimal.

It is new information to me that the ESP could in such a situation increase its transmission power so much that it can cause the Pow-U to drop in voltage so much that it reboots. But that is very likely what we're seeing here. So this is a very useful learning point.

There is a 3M scratch pad delivered with each unit, see below. Our intention is that the user places it on the bottom of the box of the Pow-U, then stick it to the front of the smart meter. But that is not written anywhere, so ... image

ArnieO commented 9 months ago

@montex : Can I use your first photo as illustration of "Non-optimal installation of amsreader"?

I intend to make a blogpost in our webshop, and maybe also include something in the User manual.

montex commented 9 months ago

Hi @ArnieO, yes sure, feel free to use it.

I used the stickers, they are very nice, but I attached it to the side of the meter :(

KDYR commented 9 months ago

Hi Unfortunately I think I have become member of the club "Reboots often with reason Vbat power on reset (1/0)". I installed my Pow-K+ with ESP32-S2 Friday last week and I have experienced 11 reboots over the 3½ days that have now passed. I have noticed some correlation with battery voltage, but 1) it is not necessarily a voltage drop occurring at the time (sometime it is a rise) and 2) the variations are very small. Battery voltage is never below 3.26V. I have also noticed something a bit strange: There is a small downward step in the "current month used" (which is a accumulation counter) followed by a subsequent recovery. I assume these observations are the effects of the reboots rather than the cause - otherwise there is a hint here. graphs of energy counter, wifi signal strength and battery voltage with indication (red vertical lines) of reboots. image

Zoom-in on four of the reboots:

image

Installed version: v2.2.28, WiFi power saving is now "Maximum", but I have tried a day with each setting and there seems to be no impact. I am not using price fetch from remote server.

Hope this could help identify the bug.

ArnieO commented 9 months ago

@KDYR : This is not normal behavior, I suspect this is a faulty ESP32m as we saw an example of here: https://github.com/UtilitechAS/amsreader-firmware/issues/608 Thank you for reaching out in parallell by email, we will send you a replacement card.

Side note: I am puzzled by the the steps you see on the upper curve. Kamstrup meters in Denmark should be configured to send accumulated energy (kWh) in each payload, which means you should not see those steps. It looks as if your meter is configured for HAN-NVE payload (used in Norway and Sweden), in which case the accumulated energy (kWh) is only sent in a larger payload each whole hour.

Can you run a Telnet debug and post the result here?

KDYR commented 9 months ago

Thanks a lot for responses! I have performed the Telnet debug procedure and attached the result telnetlog.txt PS: Don't know if it is obvious, but my meter is a Kamstrup Omnipower generation N, variant 2: 684-11-31B-N24-5101-013

OmnipowerVariant

ArnieO commented 9 months ago

I have performed the Telnet debug procedure and attached the result

Thank you. We'll analyze this and get back.

gskjold commented 9 months ago

The drops indicated is a result of the reboot, assuming this is the "Realtime current month" counter. The reason is that "Realtime current hour" counter starts from zero on reboot and is reset on top of the hour. Ideally this counter would be calculated on restart in DK where accumulated is sent every 10s, but this is not the case.

ArnieO commented 9 months ago

Ideally this counter would be calculated on restart in DK where accumulated is sent every 10s, but this is not the case.

OK, so that is a misunderstanding on my part then: There is currently no such calculation in case payload contains accumulated energy?

gskjold commented 9 months ago

There is currently no such calculation in case payload contains accumulated energy?

No, but there definitely should be