NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
422 stars 117 forks source link

File metadata/contents differ between mounts after modifying using multiple file descriptors #132

Closed hayley-leblanc closed 2 years ago

hayley-leblanc commented 2 years ago

Hi Andiry,

I have encountered an issue where NOVA seems to report different file contents and metadata after running a workload that involves modifying the same file via multiple file descriptors. I'm running NOVA on a QEMU/KVM virtual machine with 1 CPU and 128MB emulated persistent memory. NOVA is built as a loadable kernel module and I'm using it in its default configuration.

I am using this program: test.zip. The program creates a file, file0, and opens two file descriptors for it. It writes to file0 using both file descriptors, then truncates and writes again using only the first file descriptor.

This issue is not related to a crash - it shows up during regular execution.

Here are the steps I use to reproduce this issue:

  1. Mount a fresh instance of NOVA on /mnt/pmem (mount -t NOVA -o init /dev/pmem0 /mnt/pmem)

  2. Run the attached test.cpp program

  3. The output from stat /mnt/pmem/file0 for file0 is:

    File: /mnt/pmem/file0
    Size: 25          Blocks: 8          IO Block: 4096   regular file
    Device: 10301h/66305d   Inode: 33          Links: 1
    Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2021-11-30 18:44:18.000000000 +0000
    Modify: 2021-11-30 18:44:18.000000000 +0000
    Change: 2021-11-30 18:44:18.000000000 +0000
    Birth: -

    And the output from cat /mnt/pmem/file0 is just aa.

  4. Unmount and remount NOVA.

  5. stat /mnt/pmem/file0 now gives:

    File: /mnt/pmem/file0
    Size: 1           Blocks: 0          IO Block: 4096   regular file
    Device: 10301h/66305d   Inode: 33          Links: 1
    Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2021-11-30 18:44:18.000000000 +0000
    Modify: 2021-11-30 18:44:18.000000000 +0000
    Change: 2021-11-30 18:44:18.000000000 +0000
    Birth: -

    And cat /mnt/pmem/file0 just outputs a.

I haven't looked into the root cause of this issue, although I'm wondering if it could be related to #117 since it manifests in a similar way?

Andiry commented 2 years ago

Here is the inode log after remount:

[ 1963.313088] nova: set attr entry @ 0x9139000: epoch 0, trans 0, invalid 0, mode 33261, size 0, atime 1638517082, mtime 1638517082, ctime 1638517082
[ 1963.313091] nova: file write entry @ 0x9139038: epoch 0, trans 4, pgoff 0, pages 1, blocknr 37178, reassigned 0, updating 0, invalid count 0, size 25, mtime 1638517082
[ 1963.313092] nova: file write entry @ 0x9139078: epoch 0, trans 2, pgoff 1, pages 7, blocknr 37179, reassigned 1, updating 0, invalid count 7, size 32768, mtime 1638517082
[ 1963.313094] nova: set attr entry @ 0x91390b8: epoch 0, trans 3, invalid 0, mode 33261, size 1, atime 1638517082, mtime 1638517082, ctime 1638517082

Note the trans: 0 -> 4 -> 2 -> 3. So here is what happens:

trans 0: the open of fd2, which species O_TRUNC and truncates the file to 0. trans 1: write(fd, buf, 24): size is 24, offset 24 trans 2: write(fd2, buf, 32768): It overwrites the first page by trans 1, and allocate another 7 pages. trans 3: ftruncate(fd, 1), which truncates the file to 1. Note ftruncate does not change the file offset, so offset for fd is still 24. trans 4: write(fd, buf, 1). It overwrites the first page by trans 1, and set size to 25. Trans 1 now becomes trans 4.

During recovery, NOVA replays the log by entry order. So it finally replays trans 3, which set the file size to 1. It seems with ftruncate(), the ordering of entries becomes complicated. Need a better way to handle this.

Andiry commented 2 years ago

So, when replaying the log, we should keep track of the transaction number and file size. In this case, we should trust trans 4's file size (25), not trans 3's size (1). Formatting a fix...

Andiry commented 2 years ago

Fixed.