NVSL / PMFS-new

Porting PMFS to the latest Linux kernel
15 stars 25 forks source link

Possible crash consistency bug with write() #8

Open hayley-leblanc opened 2 years ago

hayley-leblanc commented 2 years ago

Hi,

I'm reporting what I believe to be a crash consistency bug in PMFS. Suppose we run the following workload on an empty PMFS file system mounted at /mnt/pmem:

creat /mnt/pmem/foo
write 1 byte to foo
write 8 bytes to foo

When performing the second write with pmfs_xip_file_write(), the if statement at line 380 of xip.c will be entered because the second write is modifying the same block as the first write, and PMFS will use pmfs_file_write_fast() to avoid using a transaction. After writing the data, this function atomically updates foo's size and time fields, and finally flushes the first cacheline of foo's inode using pmfs_flush_buffer(). However, this flush does not include a fence, so these updates to the inode may be reordered with writes from the beginning of the subsequent system call. This can result in losing the data from the second write, even though the write() call is completed.

I'm not 100% sure that this would be considered a bug in PMFS, but it can result in data loss and only requires the addition of a store fence to fix.

iaoing commented 9 months ago

I suspect this is not a bug. After pmfs_file_write_fast(), PMFS unlocks the inode by inode_unlock(inode);, which implies an implicit fence. Moreover, there will be a context switch after the write, which also implies a fence. Thus, I suspect it is unnecessary to add a fence at the end of pmfs_file_write_fast(). https://github.com/NVSL/PMFS-new/blob/8913fbf51a57e99a11197d5f26a3c1ce3c718c0f/xip.c#L430-L434

Memory barriers in kernel: https://www.kernel.org/doc/Documentation/memory-barriers.txt