geeekpi / upsplus

UPS Plus is a new generation of UPS power management module. It is an improved version of the original UPS prototype. It has been fixed the bug that UPS could not charge and automatically power off during work time. It can not only perform good battery power management, but also provide stable voltage output and RTC functions. At the same time,it support for FCP, AFC, SFCP fast charge protocol, support BC1.2 charging protocol, support battery terminal current/voltage monitoring and support two-way monitoring of charge and discharge. It can provide programmable PVD function. Power Voltage Detector (PVD) can be used to detect if batteries voltage is below or above configured voltage. Once this function has been enabled, it will monitoring your batteries voltage, and you can control whether or not shut down Raspberry Pi via simple bash script or python script. This function will protect your batteries from damage caused by excessive discharge. It can provide Adjustable data sampling Rate. This function allows you to adjust the data sampling rate so that you can get more detailed battery information and also it will consume some power. The data sampling information can communicate with the upper computer device through the I2C protocol. UPS Plus supports the OTA firmware upgrade function. Once there is a new firmware update, it is very convenient for you to upgrade firmware for UPS Plus. The firmware upgrade can be completed only by connecting to the Internet,and execute a python script. Support battery temperature monitoring and power-down memory function. UPS Plus can be set to automatically start the Raspberry Pi after the external power comes on. The programmable shutdown and forced restart function will provide you with a remote power-off restart management method. That means you don’t need to go Unplug the power cable or press the power button to cut off the power again. You can set the program to disconnect the power supply after a few seconds after the Raspberry Pi is shut down properly. And you can also reconnect the power supply after a forced power failure to achieve a remote power-off and restart operation. Once it was setting up, you don't need to press power button to boot up your device which is very suitable for smart home application scenarios.
https://wiki.52pi.com/index.php?title=UPS_Plus_SKU:_EP-0136
MIT License
73 stars 25 forks source link

[Firmware v. 9] i2c readings freeze after a couple of hours of operation #59

Closed frtz13 closed 3 years ago

frtz13 commented 3 years ago

after a couple of hours of operation, register readings from address 0x17 freeze, and do no longer return useful values. this behaviour seems quite similar to the v.7 firmware. I did not find any release notes about the v.9 firmware. is there supposed to be any improvement? BTW: I changed the sample period to 60 minutes. this did not seem to make any difference in the useful lifetime of the firmware. you are getting telemetry data from this UPS, look at submission ID 3010939 etc.

ArjenR49 commented 3 years ago

I am currently using f/w v.9 & the UPS and fan control script written by the author of this issue, frtz13. Should it be obvious that the problem as described above does in fact occur? I haven't noticed any weird things, but if it's easy to overlook, I may have missed it. The script seems to work just fine. (I have now added fan control to it since I made a simple level converter for the PWM signal on GPIO 18).

I also occasionally run a reporting script for the UPS which I've put together myself and obviously it reads many register values from the UPS. Nothing weird that I have noticed there either.

Could you indicate what values to check specifically?

I've been running stress tests on the Pi to check the operation of the Ice Tower fan under PWM control by frtz13's script, and running down the batteries to the set protection value several times. Behaviour appear to be normal. Pi has shut down after some hours when I return. Reconnecting AC restarts the Pi as expected. So far so good.

frtz13 commented 3 years ago

I generally have a look at battery temperature readings. normally, subsequent readings vary randomly +- 2°C around some mean value. when register readings freeze, battery temperature always returns the same value. other obvious values are the USB-C/micro interface voltage readings. when readings freeze, they no longer reflect reality (this is why my script uses battery current to detect the on-battery status). Also, running / charging time values no longer increase (run Full-featured-demo-code.py). Also, the switch does not work any more when trying to switch off power to the RPi (with AC power removed). edit 26/7 11:55: in this freeze situation, shutdown countdown does not work, either. and this is the real no-go feature. the only way to get the RPi start up again is to remove the batteries.

ArjenR49 commented 3 years ago

Before I upgraded from v.7 to v.9 I also had to remove the batteries after I had shut down the Pi via the GUI. With V.9 the UPS button works fine, although because of fan control it is not immediately obvious the Pi is actually going to start up ;-) The fan used to be on 3,3 V. Now I have to watch the green led on the Pi.

On v.9 I have not seen values freezing; on v.7 (or earlier?) I had seen it occasionally. Now just a minute ago, when I checked, HA showed that USB-C voltage dropped from 9,13 V to some 5 Volt. The Pi has been running all night and morning without reboot and a few days without removing the batteries. (The charger varies the voltage. It's the only one I could find at home which will go up to ca. 9 V. Other chargers have a different protocol and didn't work that way with the UPS).

At first I thought perhaps there is a memory leak in the firmware, but why would it affect your UPS board and not mine? A race condition? Something a bit out of spec in yours, or designed without taking variations in specs into account. Not all UPS boards seem to be affected.

I haven't been sitting and watching my Pi/UPS when the batteries ran out without AC connected, but found the Pi shut down and switched off after several hours of being away. Reconnected AC and off it went. I have had no reason to believe there is something wrong with the shutdown countdown.

BTW, the PI controller's parameters in your script work surprisingly well with the ICE Tower. I am not an expert on controllers at all, but having a large mass attached on the CPU is likely to place demands as to the control loop and its parameters. (I think I never passed the exam on 'Measuring & control loops', and anyway it's 50 years ago ;-) Great script!

The seller should send you a new board and take the old one back and finally get to the root of the problem.

ArjenR49 commented 3 years ago

Also, the switch does not work any more when trying to switch off power to the RPi (with AC power removed).

I tried that a minute ago, and it works OK. It's very abrupt .... Just cuts the power to the Pi. I'm sure most users will avoid that, but, yeah, that worked, too ...

ArjenR49 commented 3 years ago

I have had reporting to https://api.52pi.com/feed on for a long time now, ever since I switched to frtz13's script fanShutDownUps.py. If GeeekPi wants to compare results, I can give the ID.

peacho10 commented 3 years ago

Same problem here, after a couple of hours of operation, register readings from address 0x17 freeze, and do no longer return useful values.

I send the output of Full-featured-demo-code.py to a log file. With grep, extract the information (grep "2021|Batteries Voltage|Accumulated running time|Accumulated charged time|This running time|report voltage (Type C)" Full.log)

2021/07/28 10:12:32 Batteries Voltage: 4.208 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

2021/07/28 10:13:01 Batteries Voltage: 4.208 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

2021/07/28 10:14:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

2021/07/28 10:15:02 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

2021/07/28 10:16:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

2021/07/28 10:17:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5417 mV Accumulated running time: 56517 sec Accumulated charged time: 56517 sec This running time: 56517 sec

And, if power off, the UPS not initialized shutdown sequence. Its always on.

frtz13 commented 3 years ago

interesting to see that other boards have the same problem.

nickfox-taterli commented 3 years ago

This problem occurs occasionally, according to our statistics about three percent of users are affected, usually within 24 hours or 7 days after the occurrence of the problem, if you occur this problem, you can consider reducing the I2C communication rate and then test, there will be an improvement.

frtz13 commented 3 years ago

IMHO reducing I2C communication is not the solution, as it would only delay the occurrence of the problem. while occasional i2c communication errors are acceptable, returning erroneous values continuously, and loss of essential functionality (shutdown countdown for ex.), are not acceptable. is the problem correlated to a certain hardware version of the board? in this case it would be necessary to replace it. please clarify. Or is it a firmware issue, which will be corrected by Geekpi/52pi?

ArjenR49 commented 3 years ago

Contrary to my earlier observations, this very day on which I decided to put the bottom acrylic shield back on, as I didn't expect to have to take the batteries out any more, I noticed freezing of some values.

I also changed the battery sampling interval from 2 minutes to 5 minutes as a test to see, if the batteries LEDs will then start to indicate fully charged, no blinking, as the discharge during the sampling interval will occur less often.

Frozen values: MCU voltage (addr. x01, x02) frozen at 26,600 V, which is obviously an impossible value; USB-C input frozen at 5,035 V even though I disconnected the power supply (which depending on the load will deliver as much as ca. 9V). USB-C input being frozen means that disconnecting the AC is normally not detected any longer. However, as Frtz13 pointed out, his script, which I use, detects a change in current, which in turn shows in the Home Assistant output as 'UPS on battery? Off' (I may have changed the text). The battery temperature may be frozen, too. Environment temperature changes through the day, but the battery temperature has stayed at 50 degrees for a while now. My UPS is on f/w version 9.

As this Pi4/UPS Plus combination is intended to explicitely function as a reliable entry point to my LAN and the other servers on it during many months of absence, not detecting AC failure is a big NO NO. My UPS reports to the 52pi-site.

I didn't write the amount of minutes running since last reset down, but in my memory it was over 7000. About 5 days ... in the range hinted at above in this discussion. However, the accumulated running time minutes count seems to not update correctly. The UPS was operative while I slept about 6 hours, 360 minutes, but the count had only increased from 282 to 337 minutes ...

ArjenR49 commented 3 years ago

In fact the minutes counters have stopped updating. Stuck at 337 minutes.

peacho10 commented 3 years ago

I have made several tests(diferents sample time, force power off, restore factory settings, etc... ) and the result is the same, the values ​​read from the UPS are frozen. The tests have been done with an rpi3, and as the only i2c device, in case other i2c devices can created any problem. The UPS Time is different, I don't know why, but the values ​​end up freezing. This makes the UPS UNUSABLE.

This time, at 314697 running time (87 hours aprox)

2021/08/02 08:30:01 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:31:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:32:02 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:33:01 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:34:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec

nickfox-taterli commented 3 years ago

Do multiple attempts all freeze at the same time?

--------------原始邮件-------------- 发件人:"peacho10 @.>; 发送时间:2021年8月2日(星期一) 下午2:49 收件人:"geeekpi/upsplus" @.>; 抄送:"Tater Li @.>;"Comment @.>; 主题:Re: [geeekpi/upsplus] [Firmware v. 9] i2c readings freeze after a couple of hours of operation (#59)

I have made several tests(diferents sample time, force power off, restore factory settings, etc... ) and the result is the same, the values ​​read from the UPS are frozen. The tests have been done with an rpi3, and as the only i2c device, in case other i2c devices can created any problem. The UPS Time is different, I don't know why, but the values ​​end up freezing. This makes the UPS UNUSABLE.

This time, at 314697 running time (87 hours aprox)

2021/08/02 08:30:01 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:31:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:32:02 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:33:01 Batteries Voltage: 4.216 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec 2021/08/02 08:34:01 Batteries Voltage: 4.212 V Current charging interface report voltage (Type C): 5393 mV Accumulated running time: 314697 sec Accumulated charged time: 316172 sec This running time: 149568 sec

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

peacho10 commented 3 years ago

No, time seems random.

hellresistor commented 3 years ago

I am keeping get my rpi reboot after some hours, The batteries are discharged. Today able to listen some "click" sound from UPS when do the "drastic reboot"

ArjenR49 commented 3 years ago

What could make a sound on the ups board?? Unless a part is exploding, I could venture as the source of the sound: magnetostriction

There's what looks like a ferrite core on the board ... Ferrite can break easily, too.

Not that I've ever experienced magnetostrictive sounds from a ferrite core, that can remember, but if you hear a sound there has got to be some mechanical movement.

Arjen (On The Road)

Op ma 2 aug. 2021 20:57 schreef hellresistor @.***>:

I am keeping get my rpi reboot after some hours, The batteries are discharged. Today able to listen some "click" sound from UPS when do the "drastic reboot"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geeekpi/upsplus/issues/59#issuecomment-891255996, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNTQHKSNZ3BDQK6B6UDT23S77ANCNFSM5A64F57Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

hellresistor commented 3 years ago

What could make a sound on the ups board?? Unless a part is exploding, I could venture as the source of the sound: magnetostriction There's what looks like a ferrite core on the board ... Ferrite can break easily, too. Not that I've ever experienced magnetostrictive sounds from a ferrite core, that can remember, but if you hear a sound there has got to be some mechanical movement. Arjen (On The Road) Op ma 2 aug. 2021 20:57 schreef hellresistor @.***>: I am keeping get my rpi reboot after some hours, The batteries are discharged. Today able to listen some "click" sound from UPS when do the "drastic reboot" — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#59 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNTQHKSNZ3BDQK6B6UDT23S77ANCNFSM5A64F57Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

some "Relay" peace ?! But is still working...

frtz13 commented 3 years ago

is the problem correlated to a certain hardware version of the board? in this case it would be necessary to replace it. please clarify. Or is it a firmware issue, which will be corrected by Geekpi/52pi?

Let's get back to the topic. any investigations underway? @nickfox-taterli

bchwtz commented 3 years ago

I am having the same problem on FW v.9. After some time all the registers that store time information and should increment or decrement freeze. This is a serious problem as the shutdown and restart countdown are affected as well. Those countdown timers can be set but do not decrement and and power to the Pi is not cut anymore.

This is a firmware issue. Does the MCU rely on the RTC for its functionality or are you using only the internal timer? @nickfox-taterli Can you provide the code for the firmware, so that we can investigate and support in the development?

frtz13 commented 3 years ago

initially, I got my board with firmware v. 2. Then upgraded through all upcoming firmware versions. I never did a factory reset. I ended up with a board, for which firmware (version 9) i2c readings froze approximately every 5 hours. then I did a factory reset (set register 0x1B = 1), set the protection voltage to 3.6V and went through a discharge and recharge cycle with the batteries. Since, the board remained operational for 5 days, before it fell into the freezing state again, where i2c register values no longer reflect reality, switch off/on button does not work, and most importantly, the shutdown countdown does not work any more. something else which puzzles me: even at times when the firmware is apparently working correctly, I cannot make sense of the accumulated running/charging time values: during the whole discharge period the cumulated charging timer was incremented, putting some doubt on firmware quality. @nickfox-taterli any investigations underway? if this is not the right place to report issues with the firmware, where should we do it?

ArjenR49 commented 3 years ago

UPS Plus froze again :-( After running some thousands of minutes. Waiting for new f/w ...

ArjenR49 commented 3 years ago

@nickfox-taterli: A remedy to this bug is long overdue ... my UPS Plus froze again :-( after some 3000 minutes accumulated running time. Even a promise by the seller to solve this bug would be worth something. Running time and other time counters aren't counting any longer, such register values as should be regularly updated aren't. Only the INA measurements show life now.

In this state with this f/w v.9 the board is UNUSABLE as a UPS. I cannot walk away for even a few days and leave the board to perform its task, since I will have to remove the batteries to get it going again after only a few days. I need it to run for months and take care of black-outs. That's what a UPS is for, not for carrying a Pi around running on batteries. In this light it is a bit preposterous that the price of the UPS Plus should only go up.

peacho10 commented 3 years ago

Completely agree. I bought the EP-0118 version. It turned off when the batteries were charged, malfunction. The seller offered additional hardware but it did not work. The batteries were ending up discharging.

Now, I take another $ 25 on the EP-0136 version. And they are all problems. 9 firmwares and still worthless as UPS since it does not fulfill its function.

I no longer know what to think of this manufacturer .... I am very disappointed.

markVnl commented 3 years ago

Hi there, I am experiencing the same issue (and heard an other user with the same issue). Thought this was related to high traffic on the I2C bus.
So enabled I2C bus 3 for the display and read the first 32 registers of the UPS every 10 seconds. Which even for I2C is modest traffic.

My main question is if it occurs how do you snap out of it? Noticed that completely powering of (remove USB plug and batteries ) does not solve the issue. Did start to the give updated readings again after entering / leaving OTA mode.

(Please forgive me if this is documented and I did not find it, then show me the further reading)

ArjenR49 commented 3 years ago

"Noticed that completely powering of (remove USB plug and batteries ) does not solve the issue. Did start to the give undated readings again after entering / leaving OTA mode."

I have had numerous freezes on my ups plus, but they were always resolved by taking away usb power and batteries.

Can you describe your observation in more detail perhaps?

Could you also explain more exactly what you mean in the second sentence of the quote?

I realize no explanation given here is going to solve the problems we encounter. Especially because it looks like nobody at the manufacturer's is taking any note of this problem, but for the record ... so we know what signs to look for. Thanks for reporting!

Btw: My ups plus apparently lasts about 3000 minutes after a reset before it freezes. That's just some 50 hours, if I am not mistaken.

Arjen (On The Road)

Op do 26 aug. 2021 12:34 schreef Mark Verlinde @.***>:

Hi there, I am experiencing the same issue (and heard an other user with the same issue). Thought this was related to high traffic on the iI2c bus So enabled i2c bus 3 for the display and read the first 32 registers of the UPS every 10 seconds. which even for I2c is modest traffic.

My main question is if it occurs how do you snap out of it? Noticed that completely powering of (remove USB plug and batteries ) does not solve the issue. Did start to the give undated readings again after entering / leaving OTA mode.

  • Is there an soft reset possible ?
  • Or any other reset which does not involve removing the USB plug and batteries ?

(Please forgive me if this is documented and I did not find it, then show me the further reading)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geeekpi/upsplus/issues/59#issuecomment-906287916, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNQFLXE4TYKBX7P4M6DT6YKDFANCNFSM5A64F57Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

markVnl commented 3 years ago

Did start to the give updated readings again after entering / leaving OTA mode.

Did this (from wiki)

Method 2

    Open a terminal and typing:

i2cset -y 1 0x17 50 127 b

    Shutdown Raspberry Pi and remove all batteries and power supply.
    Insert batteries back into the battery slot.
    Execute OTA_firmware_upgrade.py python script in a terminal.
    UPS Pro Will be turned off after upgrading, Please unplug the power supply, remove the batteries from UPS Pro.
    Insert the batteries back to UPS Pro and then connect power supply and turn it on by press power switch.

theni2cset -y 1 0x17 50 0 b and removed power and batteries again, after that it started to updated readings

(Note before that had it completely disabled and started from scratch: batteries in then mounted the PI power in,; still no update on readings)

ArjenR49 commented 3 years ago

So you did an upgrade of the firmware from ? to the latest I presume, i.e. version 9.

What do you mean by undated readings? The ups doesn't do date stamping.

(The role and functions of the RTC on the ups board are undocumented afaik. The Raspberry Pi uses it in the same way as any other hardware clock added to it. And the ups likely uses it for counting time, but beyond that? From what I have read in the documentation of another UPS (Olmatic), which I own and use on an older Pi, a RTC can be programmed to (make the ups) do something at a set time independently of the Pi. Like waking up, starting, a Pi that has (been) shut down. I don't know what this would require in the way of hardware and programming, though.)

Arjen (On The Road)

Op do 26 aug. 2021 16:39 schreef Mark Verlinde @.***>:

Did start to the give undated readings again after entering / leaving OTA mode.

Did this (from wiki https://wiki.52pi.com/index.php/UPS_Plus_SKU:_EP-0136?spm=a2g0o.detail.1000023.17.4bfb6b35vkFvoW#Method_2 )

Method 2

Open a terminal and typing:

i2cset -y 1 0x17 50 127 b

Shutdown Raspberry Pi and remove all batteries and power supply.
Insert batteries back into the battery slot.
Execute OTA_firmware_upgrade.py python script in a terminal.
UPS Pro Will be turned off after upgrading, Please unplug the power supply, remove the batteries from UPS Pro.
Insert the batteries back to UPS Pro and then connect power supply and turn it on by press power switch.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geeekpi/upsplus/issues/59#issuecomment-906471991, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNS6A3FGTKHVO2A6DNTT6ZGY3ANCNFSM5A64F57Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

markVnl commented 3 years ago

A typo (corrected) start to the give uPdated reading

still my main questions are:

ArjenR49 commented 3 years ago

If the UPS f/w would have a watchdog timer like the Pi, in hardware, then maybe it would be possible to make the f/w restart when it crashes. If freezing means it actually crashes. Afaik, the hardware in the Pi that can be enabled to make a watchdog function is the graphical processor, i.e. a processor that goes on running normally even when the main cpu ends up in an endless loop. For a Pi Zero you can get a watchdog add-on board (from Omzslo), which involves a separate processor. I expect the UPS Plus has just the one processor, its MCU.

ArjenR49 commented 3 years ago

The UPS firmware v.9 keeps freezing every few thousand minutes, making the UPS Plus board an utterly useless piece of hardware, because the only way out of this is taking the batteries out.

When can we expect a new firmware version? It's now about one and a half month since v.9 was published.

markVnl commented 3 years ago

Since reducing traffic on the I2C bus as @nickfox-taterli suggested this issue did not occur again in the last week. So kind of solved for me.

ArjenR49 commented 3 years ago

Thanks for your reply. I wonder what that means in practice. On my pi with the ups I run Frtz13's script. It is needed to check the state of the batteries and shut down the pi in time. Quite essential for a UPS.

The script does other things, too. Like reporting data to HA every minute(?) and controlling a fan with PWM.

I'm not near my ups/pi now, so I can't check the script for i2c operations that perhaps could be avoided. Making the sampling interval longer, which is a setting for the script, would likely only work to increase the time until the inevitable crash.

It would be interesting to have Frtz13's commenting on this. He must have been doing at least some experimenting since the freezing problem started.

Frtz13's script itself is a bit over my head in the sense that I have never learned to do OOP (if that is what it is called these days), so I can make only small changes here and there.

Frtz13's script uses exception handling extensively to overcome i2c bus problems otherwise observed.

I had unhandled errors too in the beginning with even simple scripts and then learned a bit about exception handling and how it is used to make code execution more resilient.

The ups firmware code is apparently not very resilient if it can't handle competition for i2c bus resources from a script on the pi. Nevertheless a script running on the pi will always be needed to properly control the UPS. Therefore I find nick-taterly's admonition to decrease i2c traffic rather a bit too easy.

All in all this creates a picture that the i2c bus on the pi is unreliable, 'fragile', and therefore only marginally useful, because it easily gets code in tangles. If this is so, I would expect to have read about it before, though.

Arjen (On The Road)

Op za 4 sep. 2021 12:24 schreef Mark Verlinde @.***>:

Since reducing traffic on the I2C bus as @nickfox-taterli https://github.com/nickfox-taterli suggested this issue did not occur again in the last week. So kind of solved for me.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geeekpi/upsplus/issues/59#issuecomment-912949344, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNQ3WORKKG4OPNXVPEDUAHXWPANCNFSM5A64F57Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ArjenR49 commented 3 years ago

Apart from some onetime initialization, Frtz13's script in its main loop reads 255 bytes from the i2cbus to a buffer with exactly the same call as upsplus.py uses. Every 60 seconds, just like upsplus.py. In addition it reads the INA-registers, also once a minute.

Everything else that it does, it does based on what's in said buffer.

I fail to see where Frtz13's script burdens the i2c bus excessively, more than upsplus.py does. Upsplus.py runs every minute by virtue of crontab, whereas Frtz13's script starts at boot and handles its own looping. Moreover the upsplus_iot.py script is also started every minute at the minute by cron, and it also reads the 255 bytes into its own buffer with nothing to prevent it competing with upsplus.py.

What exactly did you do to 'reduce the traffic' in the i2cbus?

ArjenR49 commented 3 years ago

My own UPS reporting script (UPS_report_for_UPSPlus_mqtt.py) also reads 255 bytes from the i2cbus into a buffer using the same code as upsplus.py and does its report based on the buffered values. In addition it reads each INA-device once. I don't run this script on a schedule, but just from the terminal now and then.

In my case it's @frtz13 script fanShutdownUps.py which controls the UPS instead of upsplus.py, not in addition. It also does the IoT-reporting, as far as requested in the *.ini file, so I don't run upsplus_iot.py either.

bchwtz commented 3 years ago

Those arguments seem pointless to me. A reliable UPS cannot crash and die due to heavy I2C traffic, there needs to be some sort of reliability to prevent those events. I don't care wether reliability is realized through a good firmware or an external watchdog.

However, we seem the only people to care and the developers and the company who said they offer after sales service simply stopped responding. Or do you plan on investigating the issue @nickfox-taterli @yoyojacky ?

frtz13 commented 3 years ago

I agree with @bchwtz . A UPS which requires babysitting is not very useful. The only operation which had significant influence on the "freeze" behaviour was a factory reset, as reported above. What I'll do next in my script is reduce the register readings at address 0x17 to a bare minimum, to touch only registers containing "interesting" values. btw: I had a discussion with GeekPi store staff on A...xpress regarding this issue. They said the technicians were still working on the problem. However, they ended up sending me a full refund for the board.

ArjenR49 commented 3 years ago

The advice I got today from the seller I bought my UPS-board from (52Pi Official Store): please reduce i2c speed to 100khz, try again

I know there is such a thing as bus speed, but no idea, yet, how to change it. My follow-up question to the store: If that is a solution, then why is it not implemented in upsplus.py?

ArjenR49 commented 3 years ago

dropped i2c bus speed (in config.txt) to the lowest value I could find:

dtparam=i2c_arm=on,i2c_arm_baudrate=50000

Also now running Frtz13's script with IoT reporting disabled, which allows for an absolute minimum of i2c register reads for UPS operation, only 9 bytes:

it_registers = it.chain(range(0x07,0x0C), range(0x11, 0x15))

ArjenR49 commented 3 years ago

BTW: the i2c bus's default speed is 100000 (on my Pi4 running the latest Raspberry OS). The seller's admonishment assumes it is different. It feels like a diversion from the real problem.

I couldn't find any clear info on how low one can go, but I dropped it as low as 10000 and it still worked, although that very noticeably slowed down a python script which I used to run from a terminal every now and then reading all 255 bytes and reporting many different variables.

I now set the i2c bus speed to 25000 as a compromise and hope to see the UPS/Pi run without freezing. That will be hard to prove, of course. Come december I'll be away from the UPS/Pi's side and it should be able to run without baby-sitting for months ... It's the entry point to my LAN which has many other Pis and I need it to be reliable.

ArjenR49 commented 3 years ago

My UPS froze again despite the low i2c bus speed (25000) and minimal (read) access to the UPS memory (Frtz13's latest script version). The accumulated running count had stopped at 4471, accumulated charging time at 4606 and the current up time count at 399 minutes. After the reset at 0 minutes, I had arranged several power cuts from which it recovered ok before it finally froze again last night. The minutes counts are from my own reporting program, which does read all 255 bytes (at the same slow bus speed), but is only executed manually on demand in a terminal.

frtz13 commented 3 years ago

my UPS has been running for approx. 8.5 days now without freezing, longer than any time before. 9 register readings every minute, at normal i2c bus speed.

ArjenR49 commented 3 years ago

That's very good. Maybe the difference is that I checked the up-time by reading (all) i2cbus registers now and then. Anyway, my UPS is now stuck in factory default reset mode, it seems, and doesn't do normal i2cbus operations at all. I need to get the reset to proceed somehow ...

ArjenR49 commented 3 years ago

Sorted out my UPS yesterday. It was in upgrade mode. When I ran the upgrade script, f/w version 10 was installed. This is not on the code page of GeeekPi's Github ... A new f/w version requires new testing and the UPS 'learning' ...

A new type of random error/exception came up. More about that in an issue on Frtz13's github.It may be inconsequential. Traffic on the i2c bus occasionally causes errors/exceptions, it seems, and it is a matter of exception handling in the code what such an error/exception situation results in. That's at least how I understand these things at this point.