MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.81k stars 494 forks source link

Raspberry Pi 4 | OOM reaper kicks in during external disk writes #4622

Open Myth0ne opened 3 years ago

Myth0ne commented 3 years ago

ADMIN EDIT

Workaround

G_CONFIG_INJECT 'arm_64bit=' 'arm_64bit=1' /boot/config.txt
reboot

Note that this only applies to the 32-bit images, as the 64-bit image, of course, uses the 64-bit kernel by default.


I never had any issues with running out of memory until after I performed the upgrade. Shortly after boot I get prompted when I look at my dmesg log for any errors or alerts I get the following out of memory messages.

I am running a 2gb rpi 4 with 2gb of swapfile. As mentioned, I never had any memory leaking/issues before this upgrade.

https://imgur.com/yJhCQcB

Joulinar commented 3 years ago

Could you have a look to following how memory usage is

free -m
htop

Usually DietPi is not able to utilize your system fully. It would need to be an application installed.

Myth0ne commented 3 years ago

You can see my free -m at the end of the picture.

htop is saying 353m/1.59gb of ram with 1.59m of swap used. However a bunch of the processes got killed and continually do each time i reboot the pi now. Haven't installed anything new I just ran the update. It's a set n forget device. Probably should've left it as is in hindsight.

Joulinar commented 3 years ago

Do you have a backup done before and would be able to go back to previous version? This way we would be able to check if it might be caused by an apt package update.

Myth0ne commented 3 years ago

I do not. I have a backup from a while back that if I need to use I'll have to.

Is there a way to run a command when it boots to monitor and see if it is an app that is leaking memory? Though it seems to kill a whole heap off at once. I may also check for a firmware update as well.

Joulinar commented 3 years ago

Well you could login right after boot and check htop. Probably you are able to catch something.

But your system seems to be strange one. before killing processes, you would need to utilize 4GB of memory. 2GB RAM + 2GB Swap.

Not sure if it would be helpful, but you could try to raise swap to 4GB.

Myth0ne commented 3 years ago

I just tried running the command as it wouldn't let me set any higher.

/boot/dietpi/func/dietpi-set_swapfile 1024 zram

before i was using /boot/dietpi/func/dietpi-set_swapfile 2048

Before I didn't use the zram command at the end. What does this do? I'll monitor it again and see how we go.

Joulinar commented 3 years ago

zram is compressed swap file https://en.m.wikipedia.org/wiki/Zram

What is the error message if you try to raise to 4GB swap?

Myth0ne commented 3 years ago

Failed - Insufficient RAM size for desired zram-swap-file

I have a 16gb sd card so plenty of space available.

Myth0ne commented 3 years ago

Still getting the problem. Might try without zram and try 4096. Not liking the chances though

Joulinar commented 3 years ago

Don't use zram pls as it will try to place the compressed device on your physical ram.

Myth0ne commented 3 years ago

I have 9.31gb available but cannot use even 2048 or 4096 gives me same error message.

Myth0ne commented 3 years ago

Should I make the changes for AUTO_SETUP_SWAPFILE_SIZE & AUTO_SETUP_SWAPFILE_LOCATION under /boot/dietpi.txt?

Joulinar commented 3 years ago

But you did not use zram anymore? Settings on dietpi.txt will not have any effect as they are applicable on first boot only. That's why they are named as auto_setup

Myth0ne commented 3 years ago

No that's right have it pointed to /var/swap now and using 4gb memory as swap file. Will monitor and see results shortly.

Joulinar commented 3 years ago

If you are able to login via SSH, you could open htop on first session and keep it running. While on a 2nd SSH you could try to start failed services.

Myth0ne commented 3 years ago

Yeah I think this time it killed it again. Strangely enough I noticed I had an undervoltage detected as well. Which is weird as I am using a samsung s20+ charger which provides it plenty of juice. I wonder if the cord is starting to go I may try and swap cables.

Myth0ne commented 3 years ago

I noticed a couple of times Mono was the first process to get killed off. Was there any apt update to that package? Is it possible to apt install the previous package on the last version of dietpi to test???

Myth0ne commented 3 years ago

Different power source didn't change anything. Can we change the mono version? I think that is our culprit.

Myth0ne commented 3 years ago

@Joulinar This can be closed now. Looks like there was an issue with the hdd and can confirm it appears to be working without calling the oom reaper now.

FYI fix is here - https://github.com/raspberrypi/linux/issues/3210#issuecomment-716510916

Myth0ne commented 3 years ago

Ok, so it looks like it is fine but randomly goes offline after like 60 mins. I've tried changing network cables and will report back

Joulinar commented 3 years ago

Sorry for not answering but it was already 3am in central Europe.

Did you run 32bit version? And now you switched to 64bit kernel? This is far away from an ideal solution as you have some kind of mixed system. Your packages and userland will stay at 32bit while you force the kernel into 64bit mode.

You should think of to use a nativ 64bit image instead.

Furthermore why did it not happen before? Probably the latest Raspberry OS kernel introduce something. You could check and go back to previous version.

Myth0ne commented 3 years ago

Not a clue.. What's the best way to try and check? Thanks for the replies I really appreciate it!

Joulinar commented 3 years ago

You could use rpi-update to downgrade kernel. But before it would be needed to switch back 32bit kernel.

https://github.com/Hexxeh/rpi-update

As well you would need to check the under voltage messages. Is your HDD powered by itself or just connected via USB?

Myth0ne commented 3 years ago

It is powered via usb but run like this config for about 18months now.

I commented out the 64 bit in config.txt and rebooted. So far so good, just switched the usb ports around and the network cable. It has made it past 5 mins of uptime so far. Ill see if it cuts out again after 60 mins or so

Joulinar commented 3 years ago

Usually we recommend to have external USB disks powered separately as this is fixing similar issues in most cases. I guess @MichaIng could explain the technical background better than I 🙂

Myth0ne commented 3 years ago

Yeah, if it was up to me I would run that setup. However, I have a soldered portable wd elements 4tb drive. Once this is on it's last legs I want to get an externally powered one for this use case.

Myth0ne commented 3 years ago

Yeah, I might change cables as I think this one may be faulty. Seeming to find that it just falls off the network randomly now after like an hour or two.

Myth0ne commented 3 years ago

Thanks for the info @Joulinar regarding the rpi-update as I probably haven't done one in a very long time. Perhaps this may help my issue as well.

Joulinar commented 3 years ago

rpi-update will enable you to update your RPi kernel to some testing version as well as to downgrade kernel. It will not update any other apt packages.

Myth0ne commented 3 years ago

rpi-update will enable you to update your RPi kernel to some testing version as well as to downgrade kernel. It will not update any other apt packages.

I've run the update to update to the latest stable kernel so we will continue to see what happens 😄

Myth0ne commented 3 years ago

Yeah, I think I am happy for this to be closed and say the fix will be from installing rpi-update. Perhaps I was running some old original kernel on the rpi4. Now it's been running for 10-11hrs with no dramas. Thanks for the tip.

Joulinar commented 3 years ago

What is the kernel version you are running atm uname -a

Myth0ne commented 3 years ago

What is the kernel version you are running atm uname -a

Linux DietPi 5.10.52-v7l+ #1441 SMP Tue Aug 3 18:11:56 BST 2021 armv7l GNU/Linux
Joulinar commented 3 years ago

That's the latest stable one, that has been released yesterday

Myth0ne commented 3 years ago

That's the latest stable one, that has been released yesterday

Great, well we are approaching 24h stable now so I think this has sorted it out. Thanks again for your help.

Joulinar commented 3 years ago

well kernel development is done by RPi Devs, not be us. DietPi did not have an own kernel.

Myth0ne commented 3 years ago

well kernel development is done by RPi Devs, not be us. DietPi did not have an own kernel.

Not a problem. Just realised I can close this myself so will do so. Thanks again.

MichaIng commented 3 years ago

I finally had a look into this issue. Very interesting indeed and it seems the RPi engineers don't have a clue either. Using the 64-bit kernel seems to be a workaround, but it is everything else than ideal on a 32-bit system/image.

The GPU memory split seems to have an impact. Probably something during this external disk I/O process can only allocate to a first "physical" memory space, which is used for GPU memory then. Otherwise I have no idea how it can have any significant effect on a 8 GiB RPi 4 with additional swap space. @josh3003 how much GPU memory do you have applied?

grep gpu_mem /boot/config.txt

Generally we should reduce our auto-applied values based on software installs, as the 5.10 kernel and the new vcsm-cma driver seems to use it much less. Also with KMS/fKMS video driver (Kodi, Amiberry, ..., or manually enabled) it seems to be used much less.

And the memory cgroup seems to have an effect in another case, which is quite strange as enabling it allows Docker to "limit" the containers memory usage, not to increase it or so, AFAIK 🤔. However, @josh3003 do you have Docker installed or the related cgroups enabled?

cat /boot/cmdline.txt

I'll mark this issue as open, as it doesn't seem to be that rare and interesting enough to track.

Myth0ne commented 3 years ago

I finally had a look into this issue. Very interesting indeed and it seems the RPi engineers don't have a clue either. Using the 64-bit kernel seems to be a workaround, but it is everything else than ideal on a 32-bit system/image.

The GPU memory split seems to have an impact. Probably something during this external disk I/O process can only allocate to a first "physical" memory space, which is used for GPU memory then. Otherwise I have no idea how it can have any significant effect on a 8 GiB RPi 4 with additional swap space. @josh3003 how much GPU memory do you have applied?

grep gpu_mem /boot/config.txt

Generally we should reduce our auto-applied values based on software installs, as the 5.10 kernel and the new vcsm-cma driver seems to use it much less. Also with KMS/fKMS video driver (Kodi, Amiberry, ..., or manually enabled) it seems to be used much less.

And the memory cgroup seems to have an effect in another case, which is quite strange as enabling it allows Docker to "limit" the containers memory usage, not to increase it or so, AFAIK 🤔. However, @josh3003 do you have Docker installed or the related cgroups enabled?

cat /boot/cmdline.txt

I'll mark this issue as open, as it doesn't seem to be that rare and interesting enough to track.

Hey so no I can confirm I do not have docker installed.

The two results you are after is below.

gpu_mem_256=192 gpu_mem_512=320 gpu_mem_1024=320

console=tty1 root=PARTUUID=6c586e13-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait quiet net.ifnames=0

MichaIng commented 3 years ago

In case of Kodi and Jellyfin installs, 320 MiBs were set and I just reduced this to 256 MiB after reading through some threads. Even less should be fine on RPi 4. If you use Kodi or Jellyfin, it would be awesome if you could try 128 MiB and see if high res videos still play back as good as before:

/boot/dietpi/func/dietpi-set_hardware gpumemsplit 128

It seems to help with the OOM reaper on 32-bit kernel as well but only lowers the change/risk without solving the underlying issue, as far as I understand.

Myth0ne commented 3 years ago

In case of Kodi and Jellyfin installs, 320 MiBs were set and I just reduced this to 256 MiB after reading through some threads. Even less should be fine on RPi 4. If you use Kodi or Jellyfin, it would be awesome if you could try 128 MiB and see if high res videos still play back as good as before:

/boot/dietpi/func/dietpi-set_hardware gpumemsplit 128

It seems to help with the OOM reaper on 32-bit kernel as well but only lowers the change/risk without solving the underlying issue, as far as I understand.

So should I set the 64 bit like I did before?? I may have had Kodi originally installed but no longer use it.

Joulinar commented 3 years ago

no need to set the 64bit flag for the kernel

MichaIng commented 3 years ago

I may have had Kodi originally installed but no longer use it.

Ah okay, then you may reduce it further, e.g. to 64 MiB or to 16 MiB if it is headless anyway.

no need to set the 64bit flag for the kernel

Well, as far as I could see in the RPi kernel thread, only the 64-bit flag prevents the OOM killer reliably. In one case also reducing GPU memory prevented it, in other cases it only reduced the frequency of OOM events. It somehow makes sense when seeing it as a limited physical RAM space (a fixed space at the beginning or so, regardless how much memory is actually available) which is further reduced by increasing GPU memory, but remains limited even without GPU memory. But all a guess into the blue 😄.

Myth0ne commented 3 years ago

16MiB definitely crashed the pi twice now. So I am going back to 128MiB.

MichaIng commented 3 years ago

As long as you don't use a desktop or other GUI application, reducing the GPU memory cannot make it worse. But obviously it doesn't help much either :(. We'll need to contribute to the discussion on the RPi repo and try to help RPi engineers debugging it. At least a workaround with 64-bit kernel exists.

Myth0ne commented 3 years ago

As long as you don't use a desktop or other GUI application, reducing the GPU memory cannot make it worse. But obviously it doesn't help much either :(. We'll need to contribute to the discussion on the RPi repo and try to help RPi engineers debugging it. At least a workaround with 64-bit kernel exists.

Not a problem. For the sake of it, I have returned the following values

gpu_mem_256=192
gpu_mem_512=320
gpu_mem_1024=320

as well as removing the arm_64bit=1 in /boot/config.txt

Just for the sake of reliability and going back to what I had beforehand and I haven't had a crash since

MichaIng commented 3 years ago

From all I read, more than 256M GPU memory is never required, which is the reason I just reduced our auto-applied values. I think whether the OOM killer is triggered or not is based on the current filesystem cache usage and most importantly the size of the file that is currently written. Check out the thread on RPi where some scripts are provided to manually trigger it, writing 4 GiB files e.g.