Webconverger / webc

Webconverger's curated chroot from which updates originate
https://webconverger.org/upgrade/
73 stars 37 forks source link

Kernel panic / hang when creating files in tmpfs #118

Open kbarek opened 11 years ago

kbarek commented 11 years ago

To reproduce the problem:

1) Boot Webconverger (tested with 16.0 and 16.0-1-g42a8fb1) in debug mode 2) Run

dd if=/dev/zero of=/home/webc/testfile1 bs=512k count=100 # works OK dd if=/dev/zero of=/home/webc/testfile2 bs=512k count=100 # works OK dd if=/dev/zero of=/home/webc/testfile3 bs=512k count=100 # hang / kernel panic ("attempting to kill init")

There should be more than enough room to create the last file (in my case 132MB available after testfile2 was written).

This bug was initially discovered because of a Java applet that wrote to the /home/webc/.java directory and caused a kernel panic.

After mounting a new tmpfs at /home/webc/.java, the problem disappeared. I'm not sure why the Java applet caused a problem, since it only wrote about 0.3MB to the filesystem, but at least the code above will recreate the problem.

matthijskooijman commented 11 years ago

Just as another datapoint: I was recently debugging a Flash video player application that filled up /tmp with a downloaded video file. When /tmp was full, the Flash player simply stopped playing the video, nothing crashed. In this case, the RAM was not fully exhausted yet, since tmpfs defaults to a size of half the physical RAM (and the other half was enough for the rest of the system). This might be different when the physical RAM is smaller (< 2GB).

Alternatively, the difference might be that filling up /home or / (which I think is a different tmpfs from /tmp) does break the system?

kbarek commented 11 years ago

With some more testing, I have discovered that this is triggered by an out-of-memory condition caused by copying data into the tmpfs. The OOM killer will then randomly kill processes and at some point probably kills an essential task.

The system I've tested with doesn't have more than 512MB of memory, and that combined with no swap makes probably forces this to happen. This might also explain why the bug was triggered more easily while running Java applets (a memory hog).

To avoid essential processes getting killed, I can run something like this before the X environment starts:

for i in /proc/[0-9]*
do
  if [ "`cat $i/cmdline`" ]
  then
    echo -17 >  $i/oom_adj
  fi
done

This will exempt all the running processes from OOM killing, leaving the OOM to kill the X server, firefox, java or whatever other plugin process may be running. If killing these isn't enough to free memory, we probably have more serious problems anyway that needs fixing, and the end result will still be the same in any case (a kernel panic).

kaihendry commented 11 years ago

This is interesting, since I'm trying to get to the bottom of understanding the OOM with https://github.com/Webconverger/webc/issues/83 and I'm not getting very far.

Be good if we can limit the killing just to the firefox process which is controlled by a loop so it can restart, and should hopefully flush the memory taken by java/flash etc. etc.

The experts live upon http://linux-mm.org/LinuxMM I think...