LLNL / metall

Persistent memory allocator for data-centric analytics
Other
53 stars 13 forks source link

Intention behind `fsync_recursive` #335

Open liss-h opened 1 week ago

liss-h commented 1 week ago

Hi, there is this function fsync_recursive which walks up the directory tree until it reaches the root and fsyncs everything in between. https://github.com/LLNL/metall/blob/1f6c027680ea89bf92bd9a5255e006c044aa078f/include/metall/detail/file.hpp#L72-L85

This behaviour causes issues on SELinux systems when metall is used as part of a system service. The reason being that SELinux will not allow system services to read files that are not theirs.

If you assume that your data is in /var/local/service/data this causes issues as soon as metall tries to fsync /var/local as it will not have access to it with a properly configured policy (because there is no reason for it to have access there).

So I was wondering what the intention of that function is: Why does it sync all the way towards the root directory? Would it be enough to only fsync up to the datastore root?

Example strace:

open("/var/local/service/data", O_RDONLY|O_LARGEFILE) = 4
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(open+0x5e) [0x1469c36]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x115) [0x14f9035]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
fsync(4)                                = 0
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(fsync+0x1c) [0x1482e7c]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x123) [0x14f9043]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
--
open("/var/local/service", O_RDONLY|O_LARGEFILE) = 4
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(open+0x5e) [0x1469c36]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x115) [0x14f9035]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
fsync(4)                                = 0
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(fsync+0x1c) [0x1482e7c]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x123) [0x14f9043]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
--
open("/var/local", O_RDONLY|O_LARGEFILE) = 4
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(open+0x5e) [0x1469c36]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x115) [0x14f9035]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
fsync(4)                                = 0
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(fsync+0x1c) [0x1482e7c]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x123) [0x14f9043]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
--
open("/var", O_RDONLY|O_LARGEFILE)      = 4
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(open+0x5e) [0x1469c36]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x115) [0x14f9035]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
fsync(4)                                = 0
 > /usr/local/bin/service(sccp+0x19) [0x147e45f]
 > /usr/local/bin/service(fsync+0x1c) [0x1482e7c]
 > /usr/local/bin/service(metall::mtlldetail::fsync_recursive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x123) [0x14f9043]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_test_file_space_free(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x84) [0x12f6084]
 > /usr/local/bin/service(metall::kernel::mmap_segment_storage<long, unsigned long>::priv_open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void*, bool)+0x5ae) [0x12fcb0e]
 > /usr/local/bin/service(metall::kernel::manager_kernel<unsigned int, 2097152ul>::priv_open(char const*, bool, unsigned long)+0x633) [0x12fba13]
 > /usr/local/bin/service(metall::basic_manager<unsigned int, 2097152ul>::basic_manager(metall::open_only_t, char const*)+0x5e) [0x12fb37e]
 > /usr/local/bin/service(metall_manager* open_impl<metall::open_only_t{}>(char const*)+0x27) [0x12e7057]
liss-h commented 1 week ago

Somewhat related: std::filesystem::canonical does a readlink on every part of the path /var, /var/local, ... (thats news to me but apparently it does that)

This will also get denied be SELinux

KIwabuchi commented 1 week ago

So I was wondering what the intention of that function is: Why does it sync all the way towards the root directory? Would it be enough to only fsync up to the datastore root?

If I remember correctly, I found the idea in another file system-related library that was implemented by a filesystem expert. It is just for making sure to flush the changes to storage. But it is a very cautious strategy.

I think it is okay not to call fsync in parent directories.

Somewhat related: std::filesystem::canonical does a readlink on every part of the path /var, /var/local, ... (thats news to me but apparently it does that)

It sounds like we shouldn't use the fsync_recursive function on SELinux. Let's turn it off on the environment. Do you know how to detect SELinux in C++ code? If it is not easy, we could turn off the feature in any environment.

liss-h commented 1 week ago

It sounds like we shouldn't use the fsync_recursive function on SELinux. Let's turn it off on the environment. Do you know how to detect SELinux in C++ code? If it is not easy, we could turn off the feature in any environment.

I don't think its possible, at least not in a robust way. Do you think it would be enough to fsync up to the datastore base directory?

KIwabuchi commented 5 days ago

I did some investigation. Calling fsync all up to the root may be overkill. I changed the code to fsync only a parent directory. The related PR has been merged to the develop: https://github.com/LLNL/metall/pull/183. Let me know if you see any issues still or if the PR introduces a new one.