NVSL / linux-nova

NOVA is a log-structured file system designed for byte-addressable non-volatile memories, developed at the University of California, San Diego.
http://nvsl.ucsd.edu/index.php?path=projects/nova
Other
421 stars 118 forks source link

memcpy_to_pmem_nocache does not persist unaligned bytes #105

Open pcworld opened 3 years ago

pcworld commented 3 years ago

NOVA's function memcpy_to_pmem_nocache is commonly used in NOVA, and is defined as follows: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/fs/nova/nova.h#L259-L267

The naming seems to imply that this function guarantees the cache to be bypassed and thus the data to be persisted without need of explicit flushing. And indeed, this assumption is followed in at least one place of NOVA, as shown later.

Following the definition of __copy_from_user_inatomic_nocache (on x86_64) to __copy_user_nocache, we find the following documentation: https://github.com/NVSL/linux-nova/blob/587e25223ee415ccd1a7a831d95f7ccfa51b0259/arch/x86/lib/copy_user_64.S#L196-L205

Following the assembly code, we find that:

Some places in NOVA use memcpy_to_pmem_nocache but still perform explicit flushing, e.g. through the nova_update_entry_csum function, which implicitly flushes the buffer. These cases could be considered performance bugs, but do not affect crash consistency.

However, at least one place exists in which use of memcpy_to_pmem_nocache currently leads to a violation of crash consistency guarantees (I have not thoroughly looked for other cases so there might be more): nova_cow_file_write calls do_nova_cow_file_write which calls memcpy_to_pmem_nocache to copy file contents to persistent memory and does not perform further flushing.

Consider this shell script:

mount -t NOVA -o init /dev/pmem0 /mnt
echo HelloWorld > /mnt/myfile
sync

The string HelloWor (8 bytes) is copied using movnti, however the rest of the bytes l, d, \n are copied using a temporal mov instruction. Thus, the last three bytes are not guaranteed to be persisted. If the system crashes after the sync system call succeeded but the last three bytes have not yet been persisted, recovering the file system and reading /mnt/myfile will only yield HelloWor\0\0\0 as the file content, which violates the guarantees of the sync system call.

Suggested fix

In NOVA's memcpy_to_pmem_nocache, after the call to __copy_from_user_inatomic_nocache: If the function arguments do not fulfill the alignment requirements outlined above, explicitly call nova_flush_buffer with the bytes at the beginning and/or the end of the buffer.

Andiry commented 3 years ago

Thanks for the findings. Care to send a PR?

pcworld commented 2 years ago

I'm short on time, but there is the following workaround (it performs unnecessary flushes, but is at least correct hopefully):

--- a/fs/nova/nova.h
+++ b/fs/nova/nova.h
@@ -263,6 +263,9 @@ static inline int memcpy_to_pmem_nocache(void *dst, const void *src,

        ret = __copy_from_user_inatomic_nocache(dst, src, size);

+       nova_flush_buffer(dst, size, 1);
+
        return ret;
 }

A better fix would be to only call nova_flush_buffer on beginning/end where alignment does not match. The code for that should be simple, but needs to be carefully written and tested to be completely sure it works in all cases, which is why I haven't submitted a PR yet.

Andiry commented 2 years ago

I think the proposed fix is too heavy. I think for most of the time we do meet the alignment requirements, right? Perhaps performing nova_flush_buffer here with a check of alignments?

pcworld commented 2 years ago

I think for most of the time we do meet the alignment requirements, right?

I have also observed other cases where this bug hits: Symlink creation (target name), and also renaming files to long file names (longer than cache line size) or creating new files with long names.

I have found that PMFS already specifically handles this issue: https://github.com/linux-pmfs/pmfs/blob/28aaf606ff076080a72174e9f5a7be13979a8b97/fs/pmfs/xip.c#L81-L84 Perhaps pmfs_flush_edge_cachelines can just be reused in NOVA.