NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
422 stars 117 forks source link

Fix kernel hang when snapshot cleaner thread is not stopped properly #156

Open ahuja-gautam opened 3 months ago

ahuja-gautam commented 3 months ago

When nova is unmounted, the snapshot cleaner kthread is stopped with kthread_stop() in nova_save_snapshots(). If schedule() is called within kthread_stop()'s wait_for_completion(), the kthread will go to sleep forever waiting for an interrupt, resulting in a hang.

https://github.com/NVSL/linux-nova/blob/976a4d1f3d5282863b23aa834e02012167be6ee2/fs/nova/snapshot.c#L1301-L1306

https://github.com/NVSL/linux-nova/blob/976a4d1f3d5282863b23aa834e02012167be6ee2/fs/nova/snapshot.c#L1319-L1326

Reproduction:

  1. Mount a fresh nova instance using the 'mount -t NOVA -o init' command

  2. Unmount nova

  3. Remount nova at the same mount point

  4. Repeat steps 2 and 3 in a tight loop until the kernel hangs. In our experiments, we’re able to reproduce this within a range of 40 - 480 seconds with an average of 254 seconds.

We wrote a script and helper C program to reproduce the bug (Makefile and driver.c).

Fix: In the try-sleeping loop, the kthread is not scheduled out if kthread_should_stop() evaluates to true.

prepare_to_wait(&sbi->snapshot_cleaner_wait, &wait, TASK_INTERRUPTIBLE);
if (!kthread_should_stop())
    schedule();
finish_wait(&sbi->snapshot_cleaner_wait, &wait);

This fix follows standard practices found in other linux filesystems like UBIFS and NFS.

The patch linked fixes this bug. We ran the same scripts above for 10 million times and 17 hours, and the bug did not trigger. The bug was discovered using a new tool for finding f/s bugs using model checking, called Metis.

Signed-off-by: Gautam Ahuja [gaahuja@cs.stonybrook.edu](mailto:gaahuja@cs.stonybrook.edu) Signed-off-by: Yifei Liu [yifeliu@cs.stonybrook.edu](mailto:yifeliu@cs.stonybrook.edu) Signed-off-by: Erez Zadok [ezk@cs.stonybrook.edu](mailto:ezk@cs.stonybrook.edu)