frtz13 / UPSPlus_mqtt

GNU General Public License v3.0
19 stars 6 forks source link

[get_UPS_status_and_check_battery_voltage] Error getting data from UPS: [UPSPLus.init] Error reading UPS registers: [Errno 121] Gindse invoer-/uitvoerfout #23

Closed ArjenR49 closed 2 years ago

ArjenR49 commented 3 years ago

I now get this error in the log every now and then. It is new to me, I haven't seen it before. The last words are in Dutch and in English they would mean Yonder input/output error. The word 'gindse' is rather old-fashioned, but means over yonder, at the other side (it makes me wonder who translates these error message texts; must be an older person ;-)

I tried to find a reason for this error in the script code, because I have applied some changes of myself to your code. I use Kompare for that and, of course, I may have missed something while doing that. However, my understanding of your OOP code is almost non-existing, so I didn't get very far. There's all these similar names, with underscores and without. I suppose you can imagine the difficulty, when one doesn't understand this sort of code. I apologize in advance if the error is something of my own making.

ArjenR49 commented 3 years ago

That error message also comes up when I run my reporting script, which reads all i2cbus registers ('the old way'). Not immediately after a reboot, but later. INA-values are reported normally, but the part reporting on i2cbus registers fails and crashes that script.

It's not a state or situation of the UPS. It looks like a random error. fanShutDownUps.py reported it in the syslog just a minute ago, but my report script executed normally after that.

This error message (from my report script and repeating) is also what threw me off when I did the factory reset and subsequently got the ups into upgrade mode without realizing it.

frtz13 commented 3 years ago

reading the UPS status registers at 0x17 is retried every minute. does the I/O error occur randomly or systematically? when it occurs, does the script continue operation or does it crash? in case this I/O error occurs systematically: is your reporting script able to access the UPS status registers at 0x17?

ArjenR49 commented 3 years ago

The I/O error is random. I had printing in the fan control loop (of def set_speed(self):) on with output to the log file for a while. The error messages in that log file came at irregular intervals with tens of normal print messages from the fan control loop between them.

When I run my own report script, it first reads and print INA-values and then reads all 255 bytes from the UPS memory at 0x17 and prints a report on many of those values. However, the second part has started to occasionally result in the same error no. 121, which terminates the report script, at least there is no normal output after the error message. There is exception handling in the script like so: while i < 0x100: try: with SMBus(DEVICE_BUS) as bus: aReceiveBuf.append(bus.read_byte_data(DEVICE_ADDR, i)) i += 1 except Exception as e: raise Exception("[UPS_report] Error reading UPS registers: " + str(e)) but it may not be good enough to prevent the script from terminating. In any case that report script is not an endless loop.

And it may end in this error a second time, but later it may start reporting again normally.

Your script keeps running all the time (HTOP) and output its own error 121 messages at irregular intervals.

(Just 10 minutes ago after updating the EEPROM, I rebooted the pi with both its SSD and a memory card reader and card onto which I had made a clone (with different PARTUUID's) still in it, which resulted in a mess and supposedly locked root account or whatever. I since recovered from that fright. It has happened to me before ... but I had no recollection of what it took to get out of it. Well, I got out without loosing much hair, but I have to check for the error messages again. ...)

From syslog (no print command in script), and UPS without AC at first:

`Sep 15 21:40:05 RPI-HUB /fanShutDownUps.py: UPS on battery. Battery voltage: 4.016 V. Shutdown at 3.600 V
Sep 15 21:41:07 RPI-HUB /fanShutDownUps.py: UPS on battery. Battery voltage: 4.012 V. Shutdown at 3.600 V
Sep 15 21:42:09 RPI-HUB /fanShutDownUps.py: UPS on battery. Battery voltage: 4.020 V. Shutdown at 3.600 V
Sep 15 21:43:11 RPI-HUB /fanShutDownUps.py: UPS back on AC supply.
Sep 15 21:44:13 RPI-HUB /fanShutDownUps.py: [get_UPS_status_and_check_battery_voltage] Error getting data from UPS: [UPSPLus.init] Error reading UPS registers: [Errno 121] Gindse invoer-/uitvoerfout
Sep 15 21:44:49 RPI-HUB c1e47a129e93[595]: 
Sep 15 21:46:17 RPI-HUB /fanShutDownUps.py: [get_UPS_status_and_check_battery_voltage] Error getting data from UPS: [UPSPLus.init] Error reading UPS registers: [Errno 121] Gindse invoer-/uitvoerfout
`

Minutes passed since that last error message now.

Next error message:

Sep 15 21:54:38 RPI-HUB dhcpcd[448]: eth0: pid 448 deleted default route via 192.168.1.1
Sep 15 21:54:49 RPI-HUB c1e47a129e93[595]: 
Sep 15 21:56:41 RPI-HUB c1e47a129e93[595]: #033[32m21-09-15 21:56:41 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token#033[0m
Sep 15 21:56:41 RPI-HUB hassio-supervisor[1376]: #033[32m21-09-15 21:56:41 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token#033[0m
Sep 15 21:57:40 RPI-HUB /fanShutDownUps.py: [get_UPS_status_and_check_battery_voltage] Error getting data from UPS: [UPSPLus.init] Error reading UPS registers: [Errno 121] Gindse invoer-/uitvoerfout
Sep 15 21:59:50 RPI-HUB c1e47a129e93[595]: 
`
ArjenR49 commented 3 years ago

When I ran my reporting script again, this time I got the following error output. The first part concerns the INA's and is OK. The error happens in the loop reading all 255 bytes. There's a bit more output about where the error originates than in the syslog message coming from fanShutDownUps.py, so maybe this helps:

pi@RPI-HUB:~ $ /home/pi/UPSPlus_MQTT/UPS_report_for_UPSPlus_mqtt.py
------------------  15-09-2021 23:34:28  ---------------------------------

*** Data from INA219 at 0x40:
Raspberry Pi supply voltage:                         4,916 V
Raspberry Pi current consumption:                    0,990 A
Raspberry Pi power consumption:                      4,870 W

*** Data from INA219 at 0x45:
Battery voltage:                                     4,104 V
Battery current (discharging):                       1,400 A
Battery power consumption:                           5,820 W

Traceback (most recent call last):
  File "/home/pi/UPSPlus_MQTT/UPS_report_for_UPSPlus_mqtt.py", line 82, in <module>
    aReceiveBuf.append(bus.read_byte_data(DEVICE_ADDR, i))
  File "/home/pi/.local/lib/python3.7/site-packages/smbus2/smbus2.py", line 433, in read_byte_data
    ioctl(self.fd, I2C_SMBUS, msg)
OSError: [Errno 121] Gindse invoer-/uitvoerfout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pi/UPSPlus_MQTT/UPS_report_for_UPSPlus_mqtt.py", line 85, in <module>
    raise Exception("[UPS_report] Error reading UPS registers: " + str(e))
Exception: [UPS_report] Error reading UPS registers: [Errno 121] Gindse invoer-/uitvoerfout

pi@RPI-HUB:~ $ 

------------------------- ADDITION

After I had written the above, I ran the reporting script again and this time it finished without error. It is a random thing. If there is a possibility for the UPS firmware to disable outside access to its memory, the new f/w version might use that to safeguard its own communication. Just a guess. The error doesn't seem to upset your control script. It does make my report fail when it occurs, but that could possibly be prevented by better exception handling.

ArjenR49 commented 3 years ago

The error frequency based on the messages from fanShutDownUps.py in the syslog appears to be about 15 minutes. Sometimes there are two errors within minutes. When I run my report script and happen to run into this error, it has happened that the error repeated itself even three times when re-executing the same script before it started working normally again.

ArjenR49 commented 3 years ago

Keeping two terminal windows open in a VNC session on my Pi/UPS.

  1. tail -f /var/log/syslog
  2. the command to run my report script.

When I see the error message in window 1 issued by fanShutDownUps.py, and then immediately run the report script in window 2, it also suffers from the same error instead of giving normal output. I suspect ups f/w 10 incorporates a way to block i2c bus access and it looks like that blocking lasts more than just a few seconds at a time. Even as long as some 10 seconds.

frtz13 commented 3 years ago

I think that I'll stay with v.9 firmware. it has been running now for 14 days without freezing, and without I/O errors on the i2c bus. fingers crossed.

ArjenR49 commented 3 years ago

Surprising, but great, of course. I'll stick to 10 for now despite the random errors, which don't seem to cause any particular harm. I can always go back to version 9, but let's see how this goes for a few weeks.

I suppose you did a factory defaults reset before starting the test with minimal i2c bus memory access?

As you mentioned reading only 9 bytes each minute, I assume you're currently not reporting to the IoT platform.

I have enabled reporting to IoT for a change and am using your control script. (The changes I made to your code are the INA parameters and the fan control parameters, although the latter are now inconsequential as I have set the target temperature to 25. The fan does not like PWM and it has the fancy lights ;-) My alternative INA parameter values are the outcome of a rough power input measurement and balance calculation I made months ago; I found current values were much too high.)

frtz13 commented 3 years ago

yes, I did a "factory reset" some time ago, with fw 9, followed by a discharge/recharge cycle of the batteries. at that time, still running the old script reading 255 bytes at each cycle, the time before freezing changed from "a couple of hours" before reset to "a couple of days" after the reset. Since, I had to take out the batteries several times to get the fw out of the frozen status. then I just changed the script, while the fw was in an unfrozen status, and it is running since.

ArjenR49 commented 3 years ago

This morning I increased the value of sampling interval from 2 minutes to 10 minutes. Now there hasn't been any error message for 1,5 hrs, whereas before it would occur every 10-15 minutes, and sometimes after just a few minutes.

Before increasing the sampling interval, I had established that when the blue leds go off, my reporting script would end with the error as discussed, and concluded there may be a connection.

frtz13 commented 3 years ago

bad news. the UPS froze again, after more than 3 weeks. In my script, I now added a little 20ms delay between register readings. I'll test this first before publishing it.

ArjenR49 commented 3 years ago

Mine is on f/w 10, which apparently blocks i2c bus use by the pi at intervals determined by the ups. It's at about 15000 minutes of accumulated charging time now. Diminishing i2c bus traffic only seems to serve to postpone the next freezing event. Blocking the pi from accessing the bus is quite drastic. The f/w really should be developed further to work despite the pi control script using the bus when it needs to.

Arjen (On The Road)

Op zo 26 sep. 2021 11:07 schreef frtz13 @.***>:

bad news. the UPS froze again, after more than 3 weeks. In my script, I now added a little 20ms delay between register readings. I'll test this first before publishing it.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/frtz13/UPSPlus_mqtt/issues/23#issuecomment-927264751, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANCBKNQ5W2JAK2QK4ZGSURDUD3PDXANCNFSM5EBXJIAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hellresistor commented 2 years ago

I am getting that too. Yesterday have done a new installation to test .. and perform the firmware update v10.

fanShutDownUps.py: [get_UPS_status_and_check_battery_voltage] Error getting data from UPS: [UPSPLus.init] Error reading UPS registers: [Errno 121] Remote I/O error

My system:

root@FIFUrpi:~/upsplus# uname -a
Linux FIFUrpi 5.10.0-9-arm64 #1 SMP Debian 5.10.70-1 (2021-09-30) aarch64 GNU/Linux

config.txt file:

cat /boot/firmware/config.txt
# Switch the CPU from ARMv7 into ARMv8 (aarch64) mode
arm_64bit=1

core_freq=
enable_uart=1
upstream_kernel=1

dtparam=i2c_arm=on
dtoverlay=i2c-rtc,ds1307,addr=0x68
dtparam=i2c_vc=on
dtoverlay=dwc2,dr_mode=host

kernel=vmlinuz-5.10.0-9-arm64
# For details on the initramfs directive, see
# https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=10532
initramfs initrd.img-5.10.0-9-arm64

maybe missing something like "i2c_spi ?"

ArjenR49 commented 2 years ago

I can't reliably recall how I upgraded my Pi4B server with the UPS_plus, but it has been on bullseye for some time now:

pi@RPI-HUB:~ $ uname -a Linux RPI-HUB 5.10.63-v7l+ #1459 SMP Wed Oct 6 16:41:57 BST 2021 armv7l GNU/Linux pi@RPI-HUB:~ $ cat /etc/os-release PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)" NAME="Raspbian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs" pi@RPI-HUB:~ $

pi@RPI-HUB:~ $ !1646 /home/pi/UPSPlus_MQTT/UPS_report_for_UPSPlus_mqtt.py ------------------ 12-12-2021 12:56:37 ---------------------------------

*** Data from INA219 at 0x40: Raspberry Pi supply voltage: 4,916 V Raspberry Pi current consumption: 1,190 A Raspberry Pi power consumption: 6,840 W

*** Data from INA219 at 0x45: Battery voltage: 4,212 V Battery current (charging): 0,037 A Power supplied to the batteries: 0,084 W

Remainder of report is based on data collected by the UPS f/w and read from memory at 0x17

UPS board MCU voltage: 3,296 V (0x01-0x02) Voltage at Pi GPIO header pins: 4,948 V (0x03-0x04) USB type C port input voltage: 8,902 V (0x07-0x08) Micro USB port input voltage: 0,000 V (0x09-0x0A)

Battery temperature (estimate): 37°C (0x0B-0x0C) Automatic detection of battery type: yes (0x2A) Batteries fully charged at (UPS/learned value): 4,311 V (0x0D-0x0E) Current voltage at battery terminals: 4,236 V (0x05-0x06) Discharge limit for use by the control script: 3,400 V (0x11-0x12) Batteries fully discharged at (UPS/learned value): 3,400 V (0x0F-0x10) Remaining battery capacity (estimate): 99 % (0x13-0x14) Battery sampling ('blue LEDs off') interval: 10 min (0x15-0x16)

Current power state: normal (0x17)

External power is connected to the USB type C input.

Should the external power be interrupted long enough to cause the battery voltage to drop below 3,4 V (+ 0,2 V as a safety margin), an appropriate control script should halt the Pi, after which the UPS may eventually power the Pi down (& possibly restart it upon return of external power) depending on the content of 0x18 & 0x1A.

Current values of the UPS power control registers: 0x18=0 / 0x19=1 / 0x1A=0 Explanation: 0x18 - Power off timer (no restart) - not set. 0x19 - Automatic restart upon return of external power: yes 0x1A - Power off timer (with restart) - not set.

Accumulated running time: 82099 min (0x1C-0x1F) Accumulated charging time: 83266 min (0x20-0x23) Current up time: 25054 min (0x24-0x27) F/W version:10 Serial Number: 003B001D-55425717-20353230

I probably upgraded from Buster by doing full-upgrade as I had no confidence in me reinstalling Home Assistant at all, and all the other work involved in reinstalling made me just try full-upgrade, I think. The UPS firmware has been v.10 for many months now, already before the upgrade to Bullseye. The server has been running fine after the upgrade. Nothing new. The UPS plus F/W does what it does, blocking the I2C bus from time to time. I have had the battery sampling period set at 10 minutes since noticing said behaviour of f/w v. 10. I have experimentally changed the INA219 parameters from the original values which gave odd results by measuring power and current to the UPS/Pi combo in different situations.

The Pi also runs motioneye with a local camera and that has some problems with bullseye (auto-brightness not ok). As I cannot now physically access the server for many months, I try not to break it, and cannot install any new firmware, if a new version would become available. Before I left, I tested that the UPS does what it needs to do, shut the server down in a controlled manner when AC fails, and starting it up after AC returns.

I recently tried a new installation of Bullseye on a Pi Zero with different UPS (PiVoyager by Omzlo; nice product!). The 64-bit Bullseye wasn't compatible. I2C of the PiVoyager didn't work with the 32-bit Bullseye, so I feel lucky UPS_plus continued to work OK with (32-bit) Bullseye on my Pi4B server.

frtz13 commented 2 years ago

@hellresistor : did you check with i2cdetect ? on the other hand, with fw 10, you will get this I/O error from time to time. I guess this is when the fw does some work and does not want to be disturbed.

hellresistor commented 2 years ago

I recently tried a new installation of Bullseye on a Pi Zero with different UPS (PiVoyager by Omzlo; nice product!). The 64-bit Bullseye wasn't compatible. I2C of the PiVoyager didn't work with the 32-bit Bullseye, so I feel lucky UPS_plus continued to work OK with (32-bit) Bullseye on my Pi4B server.

I am using RPI4 with bullseye 64bits with RTC

@hellresistor : did you check with i2cdetect ?

Yes! well i will assume that to.

frtz13 commented 2 years ago

how often do you get this error? I get it a couple of times per day. I also set the UPS sampling interval to 10min, so it happens less frequently.

hellresistor commented 2 years ago

ok. i am running the script after new system installation since 3 hours and get like 2 messages..

frtz13 commented 2 years ago

mentioned it in readme.