David00 / rpi-power-monitor

Power Monitor (for Raspberry Pi)
https://david00.github.io/rpi-power-monitor/
GNU General Public License v3.0
1.01k stars 103 forks source link

Grafana password frequency and SD Card fail #36

Closed bobstanl closed 8 months ago

bobstanl commented 3 years ago

Hi David!

Grafana (v7.1.3) is bugging me much more with password requests than I remember when first installing my system last fall.

Today, in the middle of a session of examining some data, it interrupted me and demanded a password again!

Now, I use chrome to store the password, so it is a few clicks, but I would like to reduce the frequency, or possibly go password free on Grafana.

Since the pi is behind a router NAT, it is not like the world has direct access to the pi. "ShieldsUp" shows my ports are in stealth, but that is the extent of my network security knowledge.

In stumbling about on Grafana, I found nothing in "preferences", but some mysterious variables in Server Admin->settings->auth. The comment is "These system settings are defined in grafana.ini or custom.ini (or overridden in ENV variables). To change these you currently need to restart grafana."

Have you accepted the defaults or can you suggest changes that might be less annoying?

Thanks.

David00 commented 3 years ago

Hi bobstanl, I suspect that your Grafana container might be restarting. Can you share the output of docker ps -a and also uptime? These two commands will tell me if docker has restarted any of the containers since the last boot.

This project uses the default Grafana settings, and those .ini files are used to tweak them. See this question on stack overflow about not requiring a login to view dashboards: https://stackoverflow.com/a/51173858/6711085

bobstanl commented 3 years ago

If the docker resetting is causing more frequent password requests, you have nailed it! I knew I had a problem with the pi mysteriously quitting. Here is a 30 day screen shot.It shows a PG&E shutdown on about 08/31, but then I had a mysterious shutdown on 09/06.No power outages I am aware of, but could have been a glitch.

Now, here are the commands you asked @.***:~ $ docker ps -aCONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES5b0a94d033be        influxdb            "/entrypoint.sh infl…"   13 months ago       Up 9 days           0.0.0.0:8086->8086/tcp   influxa4961cc24745        grafana/grafana     "/run.sh"                13 months ago       Up 9 days           0.0.0.0:3000->3000/tcp 

@.:~ $ uptime 12:55:24 up 1 day, 12:50,  1 user,  load average: 7.05, 6.65, @.:~ $ uptime 12:59:17 up 1 day, 12:54,  1 user,  load average: 5.60, 6.09, @.***:~ $ dateThu 16 Sep 13:08:19 PDT 2021 This is confusing. I power rebooted the pi, after the mysterious shutdown, on 09/14 after about 22:00.-Why does the docker think it was up 9 days instead of 2? I guess the uptime indicates I had a reset and boot 24 + 12 hours ago, i.e. my reboot on 09/14. I found a linux utility, "tuptime", that might help since uptime only goes to the last boot. May have to install that on the pi. -Do you know of any other way to debug mysterious reboots? Thanks for help!

On Thursday, September 16, 2021, 08:42:57 AM PDT, David00 ***@***.***> wrote:  

Hi bobstanl, I suspect that your Grafana container might be restarting. Can you share the output of docker ps -a and also uptime? These two commands will tell me if docker has restarted any of the containers since the last boot.

This project uses the default Grafana settings, and those .ini files are used to tweak them. See this question on stack overflow about not requiring a login to view dashboards: https://stackoverflow.com/a/51173858/6711085

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

David00 commented 3 years ago

That's strange - I was expecting to see the containers have a status with a smaller uptime than the actual Pi's uptime. I don't think I've ever seen that before.

The screenshot didn't come through because the response gets sent to GitHub's issue tracker. You can attach a screenshot (and reply) directly to the issue here:

https://github.com/David00/rpi-power-monitor/issues/36

I'm not sure of any specific ways to debug reboots but a quick Google provided this link, which appears to be helpful: https://geekflare.com/check-linux-reboot-reason/

bobstanl commented 3 years ago

Here is the screenshot mentioned in previous comment pi_shutdowns_Screenshot from 2021-09-16 12-49-55

bobstanl commented 3 years ago

This issue is drifting from Grafana password to debugging pi shutdowns. I will need to do some further research to see if there is a better tool than tuptime in case this happens again. You did give a good lead for avoiding password. If you have any comments on debugging shutdowns, I would appreciate them. Otherwise, this issue is probably complete.

taintedkernel commented 3 years ago

This is just a guess, but if it was an unclean shutdown I could envision a bug with docker getting confused about the start time of the container. This SO question might help to shed some more insight on the actual docker events that happened - https://serverfault.com/questions/909265/how-to-check-the-history-of-docker-container-restarts

And regarding shutdowns, I'd look at the log files first. journalctl and/or /var/log/syslog would the places I'd start with.

bobstanl commented 3 years ago

Thanks for the two suggestions.

First, the "docker events" did not work for me. Here is one of my attempts. I appeared to get a hang, after each attempt, and had to ctl-c out. They seem to follow the examples but may not be correct:

pi@raspberrypi:~ $ docker events --filter event=restart --since='2021-08-28'

^Z [1]+ Stopped docker events --filter event=restart --since='2021-08-28'

As for the syslog, I had never looked at one before so it took some time to learn they are one day long and a number of previous days are retained:

-rw-r----- 1 root adm 103869 Sep 16 16:55 syslog -rw-r----- 1 root adm 120642 Sep 16 00:00 syslog.1 -rw-r----- 1 root adm 29791 Sep 15 00:05 syslog.2.gz -rw-r----- 1 root adm 17941 Sep 6 00:00 syslog.3.gz -rw-r----- 1 root adm 12085 Sep 5 00:00 syslog.4.gz -rw-r----- 1 root adm 13232 Sep 4 00:00 syslog.5.gz -rw-r----- 1 root adm 11525 Sep 3 00:00 syslog.6.gz -rw-r----- 1 root adm 12127 Sep 2 00:00 syslog.7.gz

The syslog.2, dated Sep 15, actually seems to have my mysterious shutdown on Sep 6. The file date indicates the previous days capture.

Here are two snips from syslog.2 and the way I interpret it.

First, I am getting a lot of "Failed to write data to Influx. Reason: b'{"error":"timeout"}\n'" events throughout all the logs. It does not appear affect the data presented by Grafana, but my RPi3 is definitely overworked.

So, the first part of the snip is several lines of this error, followed by what I believe is my power reset on Sep 14, around midnite before the 15th. I believe rpi linux grabs a stored time at boot until it can get an NTP update. So, the start of my power reboot is "Sep 6 14:09:42", where the last reading before the mysterious shutdown was at "Sep 6 14:15:31", a later time! I then skip many lines of the reboot process and go to the end of the syslog file where it gets the timesyncd, decides it is after midnite and proceeds to create a new syslog file.

Snips from syslog.2.gz:

Sep 6 14:06:17 raspberrypi python3.7[1022]: 2021-09-06 14:06:17 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:06:44 raspberrypi python3.7[1022]: 2021-09-06 14:06:44 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:07:08 raspberrypi python3.7[1022]: 2021-09-06 14:07:08 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:07:34 raspberrypi python3.7[1022]: 2021-09-06 14:07:34 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:10:27 raspberrypi python3.7[1022]: 2021-09-06 14:10:27 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:15:31 raspberrypi python3.7[1022]: 2021-09-06 14:15:31 : Failed to write data to Influx. Reason: b'{"error":"timeout"}\n' Sep 6 14:09:42 raspberrypi systemd-modules-load[107]: Inserted module 'i2c_dev' Sep 6 14:09:42 raspberrypi fake-hwclock[109]: Mon 6 Sep 20:17:02 UTC 2021 Sep 6 14:09:42 raspberrypi systemd-fsck[131]: e2fsck 1.44.5 (15-Dec-2018) Sep 6 14:09:42 raspberrypi systemd[1]: Started udev Coldplug all Devices. Sep 6 14:09:42 raspberrypi systemd[1]: Starting Helper to synchronize boot up for ifupdown... Sep 6 14:09:42 raspberrypi systemd[1]: Started Helper to synchronize boot up for ifupdown. Sep 6 14:09:42 raspberrypi systemd-fsck[131]: rootfs: clean, 69561/1899328 files, 1970600/7725184 blocks Sep 6 14:09:42 raspberrypi systemd[1]: Started File System Check on Root Device. Sep 6 14:09:42 raspberrypi systemd[1]: Starting Remount Root and Kernel File Systems... Sep 6 14:09:42 raspberrypi systemd[1]: Started Set the console keyboard layout.

...SKIP MANY BOOTUP LINES...

Sep 6 14:10:05 raspberrypi systemd[1]: Started Update UTMP about System Runlevel Changes. Sep 6 14:10:05 raspberrypi systemd[1]: Startup finished in 6.391s (kernel) + 29.124s (userspace) = 35.516s. Sep 6 14:10:09 raspberrypi dhcpcd[557]: vethbbc77be: probing for an IPv4LL address Sep 6 14:10:09 raspberrypi dhcpcd[557]: vethdfb0bbf: probing for an IPv4LL address Sep 6 14:10:10 raspberrypi systemd[1]: systemd-fsckd.service: Succeeded. Sep 15 00:05:28 raspberrypi systemd-timesyncd[330]: Synchronized to time server for the first time 195.85.215.215:123 (2.debian.pool.ntp.org). Sep 15 00:05:28 raspberrypi systemd[1]: Starting Rotate log files... Sep 15 00:05:28 raspberrypi systemd[1]: Starting Daily man-db regeneration... Sep 15 00:05:28 raspberrypi systemd[1]: Starting Daily apt download activities...

The next file in series, syslog.1, continues booting and starting power-monitor.

If anyone is interested, I could post the two files, syslog.2.gz and syslog.1.gz.

So, I still have no idea why the pi shut down on Sep 6. It hasn't done that before. I did buy an RPi 4 with intention of replacing this RPi3 but am using it on a different project. If I get another mysterious halt, I think replacing the pi is my next step.

taintedkernel commented 3 years ago

Interesting about the docker events issue. Can't say I have any ideas on that one.

You are correct about how syslog works, although the specifics on log rotation can vary depending upon system configuration, Linux distribution, etc. Also correct about timestamps jumping backwards, this can frequently happen with NTP. I imagine the kernel will default to pulling the time from the BIOS during boot until updated otherwise, which might have not been kept up to date - last I recall the hardware clock is completely separate from the internal OS clock at least on x86 systems.

And regarding the log content - to confirm, there's no log entries between the last entry on the 6th and the shutdown and reboot on the 15th? If so, this is rather unusual to have such a large gap, although not impossible. What is more likely is that the Pi hung just after 14:15 on the 6th, per the last error on failure to write to influx and large gap of missing data on the grafana dashboard. There are no logged errors obviously, but I've seen lots of systems with hardware issues exhibit this same exact behavior.

Trying on a new Pi I think is a great troubleshooting step, it's good to have one available even if temporarily. The RPi 4 is a fairly decent upgrade from a 3 as well. I run rpi-power-monitor on a v4 quite effectively, although I leverage existing deployments of influx and grafana on my kubernetes cluster.

jmadden91 commented 2 years ago

Hi David,

Somewhat related to this. I'd like to modify the default grafana settings to remove the auth login screen as I use a reverse proxy with authelia in front for authentication. Could you tell me where the docker config files would be stored in the powermon os raspbian image?

Thanks

David00 commented 2 years ago

Hey @jmadden91, there are a couple ways to go about this, but to answer your specific question about the config files:

The custom Pi OS image has a docker-compose.yml file in /boot/docker-compose/mydockerfolder.

However, the Grafana service definition does not map any local volumes to the Grafana container, and this is where a couple different ways to disable the Grafana authentication come into play.

The best way (IMO) would be to spawn a new image from your existing Grafana container, then recreate your Grafana container (this time with a volume mapped). But, this is definitely the more complicated way. See this SO question/answer if you want to try this.

The easier way to get it done as quick as possible would be to connect to the container, edit the file from inside the container, and restart Grafana.

I am working on addressing the better way to do this in some future software updates (which will include pre-provisioning Grafana), so for the time being, here's the easier way to do this:

(@bobstanl - you may want to follow these steps too to disable your Grafana authentication)

  1. Get a root shell inside the Grafana container (assuming grafana is the name of your existing container)

    docker exec -it --user 0 grafana sh
  2. Use vi to edit the config file at /usr/share/Grafana/conf/defaults.ini

    vi conf/defaults.ini
  3. Press / in Vi to search and type in the following text: auth.anon

    Set enabled = true in the [auth.anonymous] config section. (If you aren't familiar with Vi, press i to switch to edit mode, and then you can navigate/type normally).

  4. Press : followed by wq! to save and close the file.

  5. Finally, restart your Grafana container:

    docker restart grafana
bobstanl commented 2 years ago

Latest report: SD Card totally dead, lost a years worth of power data!

This thread started with a problem of puzzling resets that caused me to log in to grafana more often than usual.

The earlier resets were likely caused by power outages, which we in the PG&E forest suffer a lot. But now, it could also have been a deteriorating sd card.

The SD card will not boot, nor will it respond to every attempt to read it. Below, I will describe the symptoms and how I tried to read the card. Then I will describe some changes I am considering for my next version.

On Thursday, I noticed a gap in the grafana plot data from earlier in the day. I logged in and then did a sudo reboot. It reset that time, but in checking things, it appeared influx was not responding. When I did the "docker ps -a" test, grafana was running but influxdb had Exited (2) shortly after the reboot.

So, I attempted another sudo reboot, and got : Failed to open initctl fifo: No such device or address Failed to talk to init daemon. Ehhh!!!??? Can't reboot? So, pulled the plug to do a poweroff but now the pi would not boot at all. Later, I did try the pi with another sd card and it will boot, so the fault is with the card.

Next day, I removed the sd card and attempted to examine it. I used two different USB sd card readers, checked them with other sd cards to be sure the readers worked, and tried both ubuntu and windows. The sd card would not respond. On Linux, lsusb showed the usb reader, but not the card. It is toast and I never backed up the power data from over a years worth of operation! Woe is me!

???- Any other ideas on getting the data from a non-responding sd card? (It was a Sandisk 32GB, HC-I C4.)

FUTURE REVISIONS:

-As I indicated earlier, I will change out the RPi3B with a RPi4B-2GB.

POWER OUTAGE PROTECTION

Since this is a big problem where I live, I want a battery backed UPS that will keep the pi on just long enough to detect the outage and then do a controlled shutdown. No use having a power-monitor system running when the power is off! It then needs to automatically boot up when power is restored.

Omzlo Pivoyager On a different project, I have been using an Omzlo Pivoyager on a RPi4B. It has had many problems and I do not recommend it on a RPi4. It has a wimpy MCP73871 Battery Charge IC. This would not allow the pi to boot with only USB 5V, it powered up for a few seconds, the did a momentary cutout that rebooted the pi. With a battery connected, the system would operate normally. The RTC ran fast, gained about 12 seconds a day. Finally, now the voyager is continuously reporting a fault on the MCP73871 after only a few weeks of operation. On the other hand, Omzlo has excellent documentation, schematic, and firmware available, which is why I picked it in the first place.

Raspberry Pi UPS HAT I just purchased a Raspberry Pi UPS HAT, but have not received it yet: https://www.pishop.us/product/raspberry-pi-ups-hat/ It appears to have what I need to gracefully shutdown the RPi4 after a power outage. Mechanically, it will take some "futzing" to mount with David's PCA, but with some 40pin extender sockets, I think it will go.

???- Does anyone know where I could find a schematic for the Raspberry Pi UPS Hat by Buyapi.ca?

Other Pi battery backups that I looked at and rejected for various reasons, including cost, are: MakerHawk, Pisugar S pro, PiJuice Hat and Kuman.

If the Pi UPS HAT is a problem, my next choice will be: Geekworm Raspberry Pi X728 V2.1 It costs a lot more plus you have to buy 2 18650 batteries.

Software changes for UPS: The UPS will require changes to the main "while True" loop in David's run_main code.

On the Pi UPS HAT, I probably will only need:

Kill Pi power

    result = subprocess.run(['shutdown --poweroff now'], shell=True)

???- Does anyone see a problem with this power down technique? I am assuming the docker and so on will shutdown without trashing files.

POWER-MONITOR DATA BACKUP

In the past, if I wanted a snapshot of the pi, I would just use Win32 Disk Imager to clone the whole sd card to a hard drive. Nice, simple gui, good feedback, but it takes forever. Then I zipped it. Linux "dd" scares me, feedback poor and you can really screw things up if you misunderstand it.

Now, since I hate losing all that data, I need some way to backup at least the influx on a more regular basis.

My research has found, that if influxdb is operating, there is a backup and restore function that might be useful. The overview documentation is here: https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/

This thread has a nice discussion and even includes some docker exec examples of the influxd commands: https://stackoverflow.com/questions/56596533/influxdb-move-only-one-database-of-many-from-one-server-instance-to-another/56652014

It would be good to do this from a remote machine onto it's hard drive.

???- Does anyone know of a script to do an influx backup remotely? ???- Any other ideas to prevent such a loss of collected data?

taintedkernel commented 2 years ago

I actually had a similar problem with an SD card being corrupted in the power-monitor due to unclean shutdowns. I used ddrescue successfully, mainly just to recover the calibration values since it took some time to set up initially. This was also why I moved my influxdb deployment to a different system. But if your card is so shot that it won't even be detected, I don't think you will have much luck there.

This is a known issue with rPi's in general though - even without any power interruptions or unclean shutdowns, SD cards are not rated for nearly the amount of write cycles as a typical SSD, even moreso with cheap SD cards. I have been working to eliminate as many idle writes as possible on my PiHole and other Pi's for this reason.

Regarding the UPS, I also struggled to find a good option. Hit similar issues with insufficient power delivery on a lot of devices out there (not all of them are rated for the increased rPi v4 power usage). I don't have a good solution in place currently, but I did find a 10000mAh battery with multiple voltages that allows for simultaneous charging while providing power. Soldered in a 15W buck converter and it works pretty well, although it does not do auto-shutdown. The capacity of the battery though is sufficient to run the Pi for quite a long time from my calculations, so I just manually shutdown if I have an extended outage. I want to have a better solution in place though, thanks for pointing those projects out. On the surface the GeekWire one looks promising.

With your shutdown procedure, it looks like it should work just fine. As long as the code runs as root I don't see any obvious issue, but I would test it of course :) Docker and all applications should shutdown cleanly without issue.

And with influx backups, the functionality seems good; specifically the fact you can natively send it to a remote host. I don't know of a backup script offhand, but it looks like influx does most of the heavy lifting. You'd only need some effort to tie into your docker and ideally also a way to get notified of a backup failure. Also testing restoring from backup from time to time is always smart!

jmadden91 commented 2 years ago

@David00 Perfect mate, thanks for the detailed reply

David00 commented 2 years ago

@bobstanl I'm really sorry to hear you lost your monitoring data! Those rolling grid outages must be a pain to deal with.

The Pi-specific UPS solutions out there all look really neat, but unfortunately I have no experience with any of them. I am interested in trying one out though, because my power monitor Pi is not power-protected. Of the two you linked, I'd probably go with the Geekworm board. However, if you have the space around your Pi, I'd also recommend looking at the APC BE425M as a general battery backup solution.

As for the storage reliability - the version 5 Linux kernels support USB boot on Raspberry Pi's. However, all the ones I've tested have a bug with the underlying SPI driver that cuts the sample rate in half, which drastically impacts my project. So, we've been stuck on v4 kernels which don't support USB boot. It's been awhile since I have tested any of the latest v5 kernels, so perhaps the issue with SPI has been fixed, which would allow you to run your Pi from a USB flash drive or a NVMe disk with a USB to NVMe adapter.

You can also send the data to a remote InfluxDB server without the need for running InfluxDB on the Pi itself. Just change the host to the IP address of your remote InfluxDB server in this line of config.py.

I am working on some major changes to the software to make it easier to setup and use, so I will look into incorporating InfluxDB backups with the new changes. (If you've ever ordered hardware from me via my shop, you'll receive an announcement email about these changes once I'm ready to release it).

bobstanl commented 2 years ago

Change directory for influxdb? @David001. Since there are issues in booting from USB, can we do an end-run and redirect the influxdb storage location to an external USB drive? Then, we continue to boot linux from the sd card. ??? Other than initial loading, would influx still make a lot of writes to the sd card if the data location were an external USB directory? In other words, would the USB drive now take all the "wear and tear" instead of the sd card? ??? How to redirect data storage on influx? Is it just changing "/opt/influxdb" in the following:docker run -d --restart always --name influx -p 8086:8086 -v /opt/influxdb:/var/lib/influxdb influxdb:1.8.3 ??? Would backing it up simply be copying that directory? (i.e. replacement for "/opt/influxdb") Or are there good reasons to use the "influxd backup" commands? Thanks for all your help!

David00 commented 2 years ago

Yes, you can change the directory of the local InfluxDB data on the Pi by changing the directory in the -v /opt/influxdb part. Just make sure to keep the :/var/lib/influxdb part since this specifies the directory inside the container itself.

This could add a layer of complexity to the InfluxDB container starting up... if your USB flash drive happens to not be mounted (like on boot, for example), the container may fail to start.

This should alleviate some wear and tear, but I don't think it will be as beneficial as protecting the Pi from unclean shut downs.

As for backups - I'd suggest sticking to the method in the documentation. I'm sure there are valid reasons for the method they recommend.

David00 commented 2 years ago

Ironically enough, I came back from a weekend holiday getaway to find an unresponsive power monitor Pi. After some troubleshooting and failed fsck's, I determined that my microSD card has failed. It's stuck in a perma-read-only mode, so at least I should be able to recover the data. This card (a Samsung Evo 32GB) lasted a bit over a year.

When I get my Pi back up and running, I'll move the storage location for Influx to an external USB drive and provide some guidance on that process.

richie256 commented 2 years ago

If I can just put my little contribution into the discussion, I believe the best type of Cards to buy is the ones with mention High Endurance that are designed for high I/O like Dashcam,.. or Raspberry PI :)

I've check up the Samsung Evo 32GB and it didn't mention that.

Regards.

David00 commented 2 years ago

Thanks for the suggestion @richie256. I've ordered one of the SanDisk endurance cards so I'll give that a go. I should have automated InfluxDB backups added to the project long before the new card expires!

bobstanl commented 2 years ago

Hi David You are probably already ahead of me, but here are my experiments at Influx backup ~Basic command backup -portable -database power_monitor /tmp/powermonsnapshot ~Execute in docker pi@powermon:~ $ docker exec -it influx influxd backup -portable -database power_monitor /tmp/powermonsnapshot 2021/11/24 06:22:50 backing up metastore to /tmp/powermonsnapshot/meta.00 2021/11/24 06:22:50 backing up db=power_monitor 2021/11/24 06:22:50 backing up db=power_monitor rp=autogen shard=3 to /tmp/powermonsnapshot/power_monitor.autogen.00003.00 since 0001-01-01T00:00:00Z 2021/11/24 06:22:57 backing up db=power_monitor rp=autogen shard=8 to /tmp/powermonsnapshot/power_monitor.autogen.00008.00 since 0001-01-01T00:00:00Z 2021/11/24 06:23:02 backup complete: 2021/11/24 06:23:02 /tmp/powermonsnapshot/20211124T062250Z.meta 2021/11/24 06:23:02 /tmp/powermonsnapshot/20211124T062250Z.s3.tar.gz 2021/11/24 06:23:02 /tmp/powermonsnapshot/20211124T062250Z.s8.tar.gz 2021/11/24 06:23:02 /tmp/powermonsnapshot/20211124T062250Z.manifest pi@powermon:~ $ ~Oops it is invisible to Linux OS pi@powermon:~ $ sudo du -sh /tmp/powermonsnapshot du: cannot access '/tmp/powermonsnapshot': No such file or directory ~So copy from docker to OS pi@powermon:~ $ docker cp influx:/tmp/powermonsnapshot /tmp/powermonsnapshot ~ I had just started my new sd card pi@powermon:~ $ sudo du -sh /tmp/powermonsnapshot 71M /tmp/powermonsnapshot pi@powermon:~ $ ~Here is what the directory contains pi@powermon:~ $ cd /tmp/powermonsnapshot pi@powermon:/tmp/powermonsnapshot $ ll total 72412 drwx------ 2 pi pi 4096 Nov 24 06:23 ./ drwxrwxrwt 9 root root 4096 Nov 24 06:33 ../ -rw------- 1 pi pi 495 Nov 24 06:23 20211124T062250Z.manifest -rw-r--r-- 1 pi pi 429 Nov 24 06:22 20211124T062250Z.meta -rw------- 1 pi pi 47167916 Nov 24 06:22 20211124T062250Z.s3.tar.gz -rw------- 1 pi pi 26962181 Nov 24 06:23 20211124T062250Z.s8.tar.gz pi@powermon:/tmp/powermonsnapshot $

~Now need to move to external computer or drive from Pi

taintedkernel commented 2 years ago

Funnily enough, I also seem to be having card issues with a different rPi running HASS. During debugging I did find an interesting product though, while definitely an additional cost over the Pi itself, it eliminates this issue by leveraging an M.2 SSD instead of an SD card.

From what I can tell you would need the case + expansion board (linked below) plus a small SSD. These types of rPi devices are new to me, so I'm going to give it a shot - figured I'd mention in case someone else was interested.

https://www.amazon.com/Argon-Raspberry-Support-B-Key-Compatible/dp/B08MJ3CSW7 https://www.amazon.com/dp/B08MHYWJCP/

David00 commented 2 years ago

@taintedkernel - You can do the same with a USB flash drive, but I don't know how much more reliable that might be over the microSD card. The SSD is definitely the better option.

However, if you're looking to run the entire operating system from the drive, you'd have to use USB boot, which I recall reading somewhere is only supported in v5 linux kernels. Unfortunately, there's an issue in the v5 kernels that essentially cuts the sample rate for my project in half, which reduces the accuracy of the power calculations. My custom OS image currently uses a late v4 kernel, which doesn't support USB boot, but gives us the better sample rates.

I have tested v5 kernels from 5.10.1 up through 5.10.31, and the latest kernel is 5.10.82. So, the problem might have been fixed in the later versions, and if so, that would allow us to use USB boot.

If anyone would like to try testing the speed with the latest v5 kernel, just issue the command sudo rpi-update c827259e4adb63d1dd36e21d51dcd4243d0c1255 followed by sudo reboot 0. Then, generate a debug plot with the power monitor software and the sample rate will be displayed near the bottom of the plot. The correct sample rate should be around 30 kSPS, and about half that if the kernel has the sample-rate problem.

taintedkernel commented 2 years ago

@David00 Ah yes, you did mention that issue above and it seems this device does rely upon USB booting. I can try updating, but what would be the revision I would use to revert back to in case the issue still persists?

David00 commented 2 years ago

The known working version is 4.19.118, which you can revert to with this command: sudo rpi-update e1050e94821a70b2e4c72b318d6c6c968552e9a2

taintedkernel commented 2 years ago

Unfortunately no luck on that kernel, I was getting ~14 KSPS. The revert process worked fine and now back to around 31. I'm pretty unfamiliar with the rPi internals, but just taking a wild guess - maybe something with the default kernel config on the newer versions is causing the issue? I would (hopefully) think a notable regression in performance on a new kernel series would have been caught in testing?

David00 commented 2 years ago

Thanks for trying! There's an open issue about it on the Raspberry Pi linux repository, and there hasn't been any action on it since I suggested using v4.19.118 as a workaround back in April: https://github.com/raspberrypi/linux/issues/3381

taintedkernel commented 2 years ago

Anytime! Thanks for pointing out the issue, following it myself now as well.

5ft24dave commented 2 years ago

my pi runs POE, and my network switch etc is on a 1000VA UPS. Running NUT on the pi as well for monitoring the UPS and it will gracefully shut down the pi when battery life is under 20% with no AC. My SD card is only for boot as well, and am running a M.2 drive as the root partition in a USB3.1 to NvME adapter.

bobstanl commented 1 year ago

Hi David! I am in trouble again and it looks suspiciously like another SD failure. My grafana stopped displaying data, found out influx had exited mysteriously. In investigating that, during reboots, docker will no longer start! (But, unlike last sd failure, it still boots into OS...) Can forward my notes on investigation, if you are interested, but I think I will attemot to implement your latest since I am two years behind on my install. So, NO DOCKER anymore! Wow, that should make it lots easier to back up my data! (BTW, I have no idea of how to recover the data from this new failed SD card if I can't start docker, maybe that's another thread...)

Main question now is: Did someone fix the need for an old RPi OS version? i.e. 4.19.118, no 5.0 or later?

If fixed, can I USB3 boot off SSD instead of damn SD cards? Life would be much better!

David00 commented 1 year ago

Hey @bobstanl! That's unfortunate!! What microSD card were you using this time? It might be helpful to start tracking cards that seem to be no good for this project. On another note, I'm running the SanDisk High Endurance cards on two different systems for over a year now and so far, so good.

That's good that you can boot the card still - and also you won't need to start Docker to get the data off the card. The data should be in /opt/influxdb, so if you create an archive of this entire directory, it will be easy to get the data off and into your next instance.

Try: sudo tar -cvf /home/pi/influx_backup.tar /opt/influxdb/

This will put everything from /opt/influxdb into a single tar file at /home/pi/ called influx_backup.tar. Then, you can copy this file out with SCP or by mounting a flash drive.


Regarding the updates to the project:

If you're right on the threshold of starting from scratch, would you be interested in testing out the upcoming v0.3.0 release and going through the new documentation? I can provide both of them if so.

bobstanl commented 1 year ago

I would be happy to try the latest code.

But first, your suggestion for backing up my influx data floors me (will explain below) but also doesn't work. The reason it doesn't work may be related to my sd card change that prevents docker from starting. Here is my result when I tried to tar: pi@powermon:~ $ sudo tar -cvf /home/pi/influx_backup.tar /opt/influxdb/ tar: /home/pi/influx_backup.tar: Cannot open: Read-only file system tar: Error is not recoverable: exiting now pi@powermon:~ $

Below is a snippet of a text file that I put my debug experiments and comments in. I'd attach this file as a zip, but hate to add the whole thing to this thread. Anyway, why do I get such a protection error in attempting to start PM on this "bad" SD card? Is it related?

pi@powermon:~ $ python3 ~/rpi-power-monitor/power-monitor.py terminal 2023-02-02 08:06:45 : Could not create a backup of config.py file. Traceback (most recent call last): File "/home/pi/rpi-power-monitor/power-monitor.py", line 607, in os.makedirs('data/samples/') File "/usr/lib/python3.7/os.py", line 211, in makedirs makedirs(head, exist_ok=exist_ok) File "/usr/lib/python3.7/os.py", line 221, in makedirs mkdir(name, mode) OSError: [Errno 30] Read-only file system: 'data'

Total aside! Not related to present issue. Why I was amazed at just copying /opt/influxdb/! My concept was that I had to backup INSIDE influxdb and further, I had to do it INSIDE Docker! Here are my instructions to my self on how to "backup" influxdb data:

To backup all (both databases) use: pi@RPi4library:~ $docker exec -it influx influxd backup -portable /tmp/RPi4library copy from docker to OS pi@RPi4library:~ $docker cp influx:/tmp/RPi4library /tmp/RPi4library

Move to ssd, say with Filezilla

Restore Maybe, to get from OS to docker: rms@raspberrypi:~ $docker cp /tmp/RPi4library influx:/tmp/RPi4library Then for both db's: rms@raspberrypi:~ $docker exec -it influx influxd restore -portable /tmp/RPi4library

David00 commented 1 year ago

The read-only messages are a common indication that the card has failed. The good news is that you can still get to it to read from it. The bad news is that it will be a little bit more tricky to get the data off.

You'll have to get another microSD card or USB flash drive to boot your Pi from it. Then take your "bad" microSD and connect it to the Pi via a USB to microSD converter.

Find out which device the "bad" microSD card is showing up as with sudo fdisk -l | grep "/dev/sd*". It should show something like sda1 or sdb1.

What the "/dev/sd#" is for the microSD card, use it in the following commands (replace `/dev/sd#` with your actual device letter and number.

mkdir ~/sdcard
sudo mount /dev/sd*# ~/sdcard   

Then you should be able to copy stuff out of the microSD card, prefixing all the paths with ~/sdcard. So:

tar -cvf /home/pi/influx_backup.tar ~/sdcard/opt/influxdb/

Haven't tested this, but it should work in theory.

Your examples for docker backups look right to me.

bobstanl commented 1 year ago

Pulled the "bad" uSD card and attached with USB adapter to my Ubuntu laptop. uSD Card is labelled "Unirex 16GB"

It shows up as media/bob/rootfs and boot, i.e. normal. Other things are abnormal. Here are the Properties for "rootfs" "(some contents are unreadable)" Screenshot from 2023-02-03 19-31-05

This looks bad when I try to open /opt/influxdb/ on rootfs: Screenshot from 2023-02-03 19-30-20

Also, since "Disks" utility can sometimes be helpful, I tried to execute it but it crashed without opening: Screenshot from 2023-02-03 19-32-23

Above, taintedkernel commented on Nov 13, 2021 said " I used ddrescue successfully, "

Think I should try ddrescue?

David00 commented 1 year ago

I've been able to use the mount tactic I described above to rescue data from read-only cards. I'm not sure what Ubuntu's normal behavior is nor am I familiar with ddrescue.

bobstanl commented 1 year ago

Ok will try on a RPi tomorrow.

bobstanl commented 1 year ago

Copied results below of tar operation after mounting drive per your instructions on an RPi (used a RPi3, the PM RPi4 is till mounted in outside box.)

There were a number of errors so I did not get all of the data, I guess. Do you think it is worth trying to start new influx with this data?

Unfortunately, I did not do my Jan backup, (figures...Murphy at work). My last backup is from July, so that is default.

pi@weatherman:~ $ sudo tar -cvf /home/pi/influx_backup.tar ~/sdcard/opt/influxdb/ tar: Removing leading `/' from member names /home/pi/sdcard/opt/influxdb/ /home/pi/sdcard/opt/influxdb/meta/ /home/pi/sdcard/opt/influxdb/meta/meta.db /home/pi/sdcard/opt/influxdb/wal/ /home/pi/sdcard/opt/influxdb/wal/_internal/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/494/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00046.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00041.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00043.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00045.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00042.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/499/_00044.wal /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/498/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/495/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/496/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/491/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/497/ /home/pi/sdcard/opt/influxdb/wal/_internal/monitor/493/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/32/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/472/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/256/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/144/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/392/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/368/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/376/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/424/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/64/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/248/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/8/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/72/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/151/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/176/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/312/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/264/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/16/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/168/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/464/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/288/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/88/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/352/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/272/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/344/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/40/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/104/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/80/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/216/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/296/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/3/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/96/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/456/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/432/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/192/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/384/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/448/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/208/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/48/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/492/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/304/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/360/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/280/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/484/ /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/484/_08885.wal /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/484/_08904.wal /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/484/_08886.wal tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/484/_08884.wal: Cannot stat: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/416: Cannot stat: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/480/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/24/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/240/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/136/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/200/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/112/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/120/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/184/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/160/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/224/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/408/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/328/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/336/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/400/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/232/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/500/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/440/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/56/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/128/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/wal/power_monitor/autogen/320/: Cannot savedir: Input/output error /home/pi/sdcard/opt/influxdb/data/ tar: /home/pi/sdcard/opt/influxdb/data/_internal/: Cannot savedir: Input/output error tar: /home/pi/sdcard/opt/influxdb/data/power_monitor/: Cannot savedir: Input/output error tar: Exiting with failure status due to previous errors pi@weatherman:~ $

5ft24dave commented 1 year ago

is there a direct upgrade route to 0.2.0 from the original 0.1.0 without having to reload from scratch?

David00 commented 1 year ago

Copied results below of tar operation after mounting drive per your instructions on an RPi (used a RPi3, the PM RPi4 is till mounted in outside box.)

There were a number of errors so I did not get all of the data, I guess. Do you think it is worth trying to start new influx with this data?

Unfortunately, I did not do my Jan backup, (figures...Murphy at work). My last backup is from July, so that is default.

That's not looking good. I have some notes on recovering data from my failed cards, and I did the following. No promises this will work though, as it seems there are many ways for the card to fail.

As long as the bad card is recognized by the filesystem and shows up in the output of "sudo fdisk -l", you can try to clone it to a working card with the following steps. For curiosity, you can run a file system check with.. (note that my bad card was listed as /dev/sda when connecting it to a working Pi via the USB to microSD adapter)

sudo fsck /dev/sda

The output of this shows "bad magic number in super-block", and it gave me a command to run to try to repair it. The repair never worked for me, so I moved on to cloning the card with Win32 Disk Imager. You could probably use dd to do this too but I wanted to do it off the Pi on a faster machine. Note that the clone will take an entire image of the entire disk, so it will take up some space on your PC. You can shrink the image using PiShrink.

After you've shrunk the cloned image, you can write it back to a working card, and then connect the card to a working Pi (as an external drive, not to boot from it) with your microSD to USB converter. Then, you can try to repair the file system on the new card with:

sudo fdisk -l                  # make sure the card is recognized - mine was still at /dev/sda
sudo fsck /dev/sda
sudo fsck -f /dev/sda1         # Try to repair each partition listed on the device
sudo fsck -f /dev/sda2

One you do that, you can very likely just boot the Pi from the newly cloned and repaired microSD card, and if booting from it still doesn't work for some reason, you should be able to extract the data as described earlier.

David00 commented 1 year ago

is there a direct upgrade route to 0.2.0 from the original 0.1.0 without having to reload from scratch?

@5ft24dave, I've opened a new issue here to discuss this. Short answer: yes. Long answer: will be in the new issue in a few minutes.

David00 commented 1 year ago

@bobstanl, I just released the first beta for v0.3.0! Let's move the discussion about v0.3.0 here:

https://github.com/David00/rpi-power-monitor/discussions/88

On the subject of this issue (SD cards failing).... If you're able to get the data off your microSD card and onto another microSD card, see the big write-up in the comments on issue #87. It includes my InfluxDB migration utility and instructions to update your Pi and this project from v0.1.0.

bobstanl commented 1 year ago

Hi David

Followed your suggestions to copy the bad uSD to a new uSD and the old system is booting and grafana is responding. That's the good news.

BUT there were some disturbing messages when doing the tar operation. I think these snippets mean some data was lost on the new card: /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/56/ /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/56/fields.idx /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/56/000000007-000000002.tsm.bad /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/128/ /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/128/000000007-000000002.tsm /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/128/fields.idx /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/320/ /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/320/000000004-000000002.tsm /home/pi/sdcard/opt/influxdb/data/power_monitor/autogen/320/fields.idx

I think the ".bad" files might indicate lost data.

My two basic questions now are: -How do I find out what is missing? -Can I use my July backup to replace? Or would that make things worse by duplicating any "good" existing data?

Missing data I don't know how to find out what is still there.

I can throw darts at the problem by checking one day at a time in grafana over the past year and usually, I see a good record.

Can you think of a better way?

I will research what influxdb does if importing a backup with duplicate records.

Also, I have just started looking at the comments on your influxdb "migration script". Can it handle data sets with gaps?

bobstanl commented 1 year ago

Duplicate data import OK, I know the answer, so I CAN import my old backup without duplicating data points

Duplicate data points For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example:

Still trying to find out what's missing, now by displaying a months worth at a time, looking for gaps.

Ran into the Bad Gateway again. Am ssh'd into RPi and found influx had exited docker. A few minutes before, I had stopped and disabled power-monitor, since the RPi4 is on the bench. That should not stop influx, should it? If not, this install has deeper problems...

Sorry, sorry, I am getting too wordy.

Problem with displaying large data sets, like a year or even a month is that I am getting "out of memory errors" and crashing the influxdb container image. (Did I say that correctly? I am looking at docker logs and trying to understand the errors in 16.5MB text file from the command: docker logs a4bef288e909 &> influxdblog1.txt which docker ps -a identified as the influxdb container. I am SO far out of my depth!)

I have a similar system running, collecting weather data, copying the power-monitor techniques. It can show a years worth of data without crashing... Hmm, maybe it is because the "powermon" RPi4 is 2GB and the weather RPi4 is 4GB. I would think influx would have a better way of dealing with asking for too much data, would it not?

David00 commented 1 year ago

My two basic questions now are: -How do I find out what is missing? -Can I use my July backup to replace? Or would that make things worse by duplicating any "good" existing data?

  • I tried to look at all of last year in grafana, thinking I might see gaps. I got "Influxdb Error : Bad Gateway"

This is very likely due to a lack of memory. Grafana is essentially asking for a couple hundred million points from Influx for that time frame (based on the raw data samples). It's really not possible on the Pi, which is where continuous queries come into play. In my new version 0.3.0, continuous queries get created so that your data gets downsampled into 5 minute intervals, which significantly reduces the amount of points over long query intervals, making them possible to do on a Pi. Also, the migration script will conduct the downsampling on your old data set, enabling you to look at it from a long-term perspective.

Still trying to find out what's missing, now by displaying a months worth at a time, looking for gaps.

I think if you first take a backup of your Influx database, then import your old backup into the existing power_monitor database, and let Influx resolve duplicates, you can see how it looks.

Ran into the Bad Gateway again. Am ssh'd into RPi and found influx had exited docker. A few minutes before, I had stopped and disabled power-monitor, since the RPi4 is on the bench. That should not stop influx, should it? If not, this install has deeper problems...

Nope, but the querying over long intervals will cause the OS to kill Influx for trying to consume too much memory.

So, I think your best bet is to:

See this link for the backup, v0.2.0 upgrade, and db_migrate instructions:

https://github.com/David00/rpi-power-monitor/issues/87#issuecomment-1418203226

See the v0.3.0 release page for instructions on upgrading to v0.3.0 after you have done the upgrade to v0.2.0:

https://github.com/David00/rpi-power-monitor/releases/tag/v0.3.0-beta

The upgrade to v0.2.0 is quite involved, just because the whole environment changed. The upgrade to v0.3.0 is relatively quick and easy.

bobstanl commented 1 year ago

David said: So, I think your best bet is to:

Backup your existing (now restored) data Import your old backup on top of your existing data Run the db_migrate.py script to downsample your data Upgrade to v0.2.0, then v0.3.0 See this link for the backup, v0.2.0 upgrade, and db_migrate instructions:

bobstanl: Have just completed the #87 Influx Backup Procedure, copied the tar.gz to another machine, filezilla.

Edit: As I prepare to import my July backup, I am running short of space on 16GB drive That Jul 22 is about 2GB compressed, and I will need to uncompress and put a copy in docker "area"(?) For example: Copy backup to /tmp/powermon_220706 docker cp /tmp/powermon_220706 influx:/tmp/powermon_220706 However, backup you called for is still in "area": pi@powermon:~ $ sudo du -h /opt/influxdb/ 2.0G /opt/influxdb/backups/powermon_migrate ... 5.4G /opt/influxdb/ #full size with backup and original data Plus, I have the pi@powermon:~ $ du -h powermon_migrate.tar.gz 2.0G powermon_migrate.tar.gz

So, I need to start deleting duplicate data to import the Jul 22 backup. end edit

PS Really love the RPi Imager ability to setup username, password, wifi, etc before burning image! One thing puzzled me, 32 bit or 64? I went conservative, not knowing about SW compatibility with 64. Also, saw one strange post that 64bit OS would run hotter!

David00 commented 1 year ago
  • Is the backup used in the next procedure or just for data safety?

  • If I import my old backup data now, per above, do you suggest that I backup again? For safety, no, right? backup used in following, yes...

There's not much point in backing up again after doing the initial Influx backup procedure in #87. The initial backup will contain all of your data at the best point possible, prior to downsampling.

If you import your old data on top of your new data, and it looks better in your opinion, I would grab a new backup just in case, and use that backup in place of the backup taken in the previous step. Just make sure to name it the same so it can be used in the db_migrate.py script.

  • Since I desperately want to get away from uSD and on to my USB-SSD with a new 32bit lite on it, is

    87 Upgrade and Data Restoration Procedure" the correct point to install on new drive?

    Or should I continue on new uSD to do 0.1 to 0.2 and then copy stuff over?

I think you have two options in your personal deployment:

I would personally import your old data onto your existing microSD card, see if it improves your suspected missing data situation, and then generate a new backup via steps 3 through 5 of the Influx Backup Procedure in this comment: https://github.com/David00/rpi-power-monitor/issues/87#issuecomment-1418203226

Then, once you export the new backup, you can use it as the source for the db_migration.py script on a fresh v0.2.0 install on your SSD, starting specifically in this comment, step 3 of the Upgrade and Data Restoration Procedure: https://github.com/David00/rpi-power-monitor/issues/87#issuecomment-1418208519

PS Really love the RPi Imager ability to setup username, password, wifi, etc before burning image! One thing puzzled me, 32 bit or 64? I went conservative, not knowing about SW compatibility with 64. Also, saw one strange post that 64bit OS would run hotter!

I have not yet seen that ability in RPI Imager! I must be lacking an update :) But I do know that Raspberry Pi OS has changed the way the user credentials are setup in the latest images. I'd recommend the 32 bit image. There's nothing really in this project that needs the 64 bit image. I only use the 32 bit image in my custom builds for full Pi lineup compatibility.

PS - thanks for your patience in all of this upgrade craziness. I feel like this is the infancy of a future "click a button to upgrade" functionality. which is where I'd like to get to!

bobstanl commented 1 year ago

Thank YOU! Quick hint on pre-configuring the Image to be written: Select a Raspbian image: Screenshot from 2023-02-07 23-22-59

Then use gear symbol:

Screenshot from 2023-02-07 23-23-21

How RPi Imager configures the image to be installed. After you checkmark and fill out the "Advanced Options", a new file, "firstrun.sh" is placed in the /boot of the uSD card (or other drive) that you are putting th image on. I checked later and firstrun.sh was deleted after the first run.

Here is a copy of firstrun.sh. I change my wifi data, passwords, etc. but left the change of "pi" to "rms".

!/bin/bash

set +e

CURRENT_HOSTNAME=cat /etc/hostname | tr -d " \t\n\r" echo powermon >/etc/hostname sed -i "s/127.0.1.1.$CURRENT_HOSTNAME/127.0.1.1\tpowermon/g" /etc/hosts FIRSTUSER=getent passwd 1000 | cut -d: -f1 FIRSTUSERHOME=getent passwd 1000 | cut -d: -f6 if [ -f /usr/lib/userconf-pi/userconf ]; then /usr/lib/userconf-pi/userconf 'rms' '$5$XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' else echo "$FIRSTUSER:"'$5$XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' | chpasswd -e if [ "$FIRSTUSER" != "rms" ]; then usermod -l "rms" "$FIRSTUSER" usermod -m -d "/home/rms" "rms" groupmod -n "rms" "$FIRSTUSER" if grep -q "^autologin-user=" /etc/lightdm/lightdm.conf ; then sed /etc/lightdm/lightdm.conf -i -e "s/^autologin-user=./autologin-user=rms/" fi if [ -f /etc/systemd/system/getty@tty1.service.d/autologin.conf ]; then sed /etc/systemd/system/getty@tty1.service.d/autologin.conf -i -e "s/$FIRSTUSER/rms/" fi if [ -f /etc/sudoers.d/010_pi-nopasswd ]; then sed -i "s/^$FIRSTUSER /rms /" /etc/sudoers.d/010_pi-nopasswd fi fi fi systemctl enable ssh cat >/etc/wpa_supplicant/wpa_supplicant.conf <<'WPAEOF' country=US ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev ap_scan=1

update_config=1 network={ ssid="MySSID" psk=MyEncodedPasswordXXXXXXXXXX }

WPAEOF chmod 600 /etc/wpa_supplicant/wpa_supplicant.conf rfkill unblock wifi for filename in /var/lib/systemd/rfkill/*:wlan ; do echo 0 > $filename done rm -f /etc/localtime echo "America/Los_Angeles" >/etc/timezone dpkg-reconfigure -f noninteractive tzdata cat >/etc/default/keyboard <<'KBEOF' XKBMODEL="pc105" XKBLAYOUT="us" XKBVARIANT="" XKBOPTIONS=""

KBEOF dpkg-reconfigure -f noninteractive keyboard-configuration rm -f /boot/firstrun.sh sed -i 's| systemd.run.*||g' /boot/cmdline.txt exit 0

bobstanl commented 1 year ago

Influxdb "restore" into existing database did not work.

Well, it looks like I am missing Jan 22 as well as Jan 23 and much of Dec 23. I "threw darts" asking for one week at a time in grafana. Obviously not complete but did find those dates missing data. The backup restore appears to be useless for integrating into an existing database. Just tried the standard restore and it complained "database may already exist" Found the following which claims that to have a technique to combine but it did not work for me. Here is the website: https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/#restore-data-to-an-existing-database

Here are my various attempts: pi@powermon:~ $ docker container inspect -f '{{ .Mounts }}' influx [{bind /opt/influxdb /var/lib/influxdb rw true rprivate}] pi@powermon:~ $ docker cp /tmp/powermon_220706 influx:/tmp/powermon_220706 pi@powermon:~ $ docker exec -it influx influxd restore -portable /tmp/powermon_220706 2023/02/08 23:10:22 error updating meta: DB metadata not changed. database may already exist restore: DB metadata not changed. database may already exist pi@powermon:~ $

pi@powermon:~ $ docker exec -it influx influxd restore -portable -db power_monitor -newdb power_monitor-tmp /tmp/powermon_220706 2023/02/08 23:51:24 Restoring shard 56 live from backup 20220707T062446Z.s56.tar.gz 2023/02/08 23:51:39 Restoring shard 64 live from backup 20220707T062446Z.s64.tar.gz 2023/02/08 23:51:54 Restoring shard 96 live from backup 20220707T062446Z.s96.tar.gz 2023/02/08 23:51:54 Restoring shard 151 live from backup 20220707T062446Z.s151.tar.gz 2023/02/08 23:52:09 Restoring shard 224 live from backup 20220707T062446Z.s224.tar.gz 2023/02/08 23:52:16 Restoring shard 72 live from backup 20220707T062446Z.s72.tar.gz 2023/02/08 23:52:29 Restoring shard 240 live from backup 20220707T062446Z.s240.tar.gz 2023/02/08 23:52:34 Restoring shard 40 live from backup 20220707T062446Z.s40.tar.gz 2023/02/08 23:52:52 Restoring shard 104 live from backup 20220707T062446Z.s104.tar.gz 2023/02/08 23:53:10 Restoring shard 168 live from backup 20220707T062446Z.s168.tar.gz 2023/02/08 23:53:26 Restoring shard 200 live from backup 20220707T062446Z.s200.tar.gz 2023/02/08 23:53:33 Restoring shard 3 live from backup 20220707T062446Z.s3.tar.gz 2023/02/08 23:53:41 Restoring shard 120 live from backup 20220707T062446Z.s120.tar.gz 2023/02/08 23:53:59 Restoring shard 144 live from backup 20220707T062446Z.s144.tar.gz 2023/02/08 23:54:14 Restoring shard 176 live from backup 20220707T062446Z.s176.tar.gz 2023/02/08 23:54:33 Restoring shard 208 live from backup 20220707T062446Z.s208.tar.gz 2023/02/08 23:54:35 Restoring shard 32 live from backup 20220707T062446Z.s32.tar.gz 2023/02/08 23:54:50 Restoring shard 88 live from backup 20220707T062446Z.s88.tar.gz 2023/02/08 23:54:50 Restoring shard 216 live from backup 20220707T062446Z.s216.tar.gz 2023/02/08 23:54:56 Restoring shard 264 live from backup 20220707T062446Z.s264.tar.gz 2023/02/08 23:55:01 Restoring shard 24 live from backup 20220707T062446Z.s24.tar.gz 2023/02/08 23:55:12 Restoring shard 48 live from backup 20220707T062446Z.s48.tar.gz 2023/02/08 23:55:31 Restoring shard 128 live from backup 20220707T062446Z.s128.tar.gz 2023/02/08 23:55:50 Restoring shard 136 live from backup 20220707T062446Z.s136.tar.gz 2023/02/08 23:56:05 Restoring shard 192 live from backup 20220707T062446Z.s192.tar.gz 2023/02/08 23:56:23 Restoring shard 16 live from backup 20220707T062446Z.s16.tar.gz 2023/02/08 23:56:40 Restoring shard 112 live from backup 20220707T062446Z.s112.tar.gz 2023/02/08 23:56:56 Restoring shard 232 live from backup 20220707T062446Z.s232.tar.gz 2023/02/08 23:57:01 Restoring shard 248 live from backup 20220707T062446Z.s248.tar.gz 2023/02/08 23:57:06 Restoring shard 256 live from backup 20220707T062446Z.s256.tar.gz 2023/02/08 23:57:16 Restoring shard 80 live from backup 20220707T062446Z.s80.tar.gz 2023/02/08 23:57:16 Restoring shard 160 live from backup 20220707T062446Z.s160.tar.gz 2023/02/08 23:57:33 Restoring shard 184 live from backup 20220707T062446Z.s184.tar.gz pi@powermon:~ $ docker exec -it influx influx Connected to http://localhost:8086 version 1.8.3 InfluxDB shell version: 1.8.3

SHOW DATABASES name: databases name

_internal power_monitor power_monitor-tmp SELECT INTO "power_monitor".autogen.:MEASUREMENT FROM "power_monitor-tmp".autogen././ GROUP BY * name: result time written


0 0 use power_monitor Using database power_monitor show MEASUREMENT ERR: error parsing query: found EOF, expected EXACT, CARDINALITY at line 1, char 18 SHOW measurements name: measurements name

home_load net raw_cts solar voltages exit pi@powermon:~ $ docker exec -it influx influx Connected to http://localhost:8086 version 1.8.3 InfluxDB shell version: 1.8.3

SELECT INTO "power_monitor".autogen.:measurements FROM "power_monitor-tmp".autogen././ GROUP BY ERR: error parsing query: found MEASUREMENTS, expected MEASUREMENT at line 1, char 40 Warning: It is possible this error is due to not setting a database. Please set a database with the command "use ". use power_monitor Using database power_monitor SELECT INTO "power_monitor".autogen.:measurements FROM "power_monitor-tmp".autogen././ GROUP BY ERR: error parsing query: found MEASUREMENTS, expected MEASUREMENT at line 1, char 40 SELECT INTO "power_monitor".autogen.:measurement FROM "power_monitor-tmp".autogen././ GROUP BY * name: result time written


0 0 exit pi@powermon:~ $

David00 commented 1 year ago

Thanks for the info on the Pi OS Imager settings! I'll add that to the new documentation site soon!

Interesting, it looks like you were able to combine the results from the tmp db into your main db with that last line:

SELECT * INTO "power_monitor".autogen.:measurement FROM "power_monitor-tmp".autogen./.*/ GROUP BY *

How did it turn out?

bobstanl commented 1 year ago

I don't think it did anything, as far as I can tell. Both times it came back with time = 0 and written = 0

SELECT INTO "power_monitor".autogen.:measurement FROM "power_monitor-tmp".autogen././ GROUP BY * name: result time written


0 0 Still a chance that I made a mistake, but I can't find it.