Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

std::experimental::filesystem::file_time_type::min() exceeds values supported by the underlying file system #34963

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR35990
Status CONFIRMED
Importance P normal
Reported by Volodymyr Sapsai (vsapsai@apple.com)
Reported on 2018-01-17 12:18:34 -0800
Last modified on 2018-01-17 13:34:24 -0800
Version unspecified
Hardware PC All
CC eric@efcs.ca, llvm-bugs@lists.llvm.org, mclow.lists@gmail.com, zilla@kayari.org
Fixed by commit(s)
Attachments testing.c (1098 bytes, text/x-csrc)
Blocks
Blocked by
See also

Created attachment 19698 Code to test with low-level API.

std::experimental::filesystem::file_time_type::min() exceeds values supported by the underlying file system. When you set file modification time with filesystem::last_write_time(path, time) with min value and read it back with filesystem::last_write_time(path), received value can be significantly different from the supplied one.

Steps to reproduce:

  1. Call auto min_time = std::experimental::filesystem::file_time_type::min(); std::experimental::filesystem::last_write_time(path, min_time);
  2. Call auto current_time = std::experimental::filesystem::last_write_time(path);
  3. Reboot machine you are running the test on.
  4. Call auto current_time2 = std::experimental::filesystem::last_write_time(path);

Expected result: min_time, current_time, current_time2 should be equal, at least to the second.

Actual result: Results depend on the underlying file system.

On macOS with APFS min_time != current_time, current_time == current_time2. Inequality is detected by the test std/experimental/filesystem/fs.op.funcs/fs.op.last_write_time/last_write_time.pass.cpp

On Ubuntu 16.04.3 with ext3 min_time == current_time, current_time != current_time2.

Details: For writing and reading file modification timestamp we are using utimensat and stat, respectively. In my testing filesystem::file_time_type::min() was -2^63 microseconds which can be expressed with struct timespec which has 64-bit tv_sec and 64-bit tv_nsec. But file systems I tested don't use 128 bits to store timestamps on the disk. One common approach seems to be using 128-bit value while inode metadata is still in memory. That allows the test last_write_time.pass.cpp to pass on ext3 and probably ext4. But according to POSIX.1-2008 http://pubs.opengroup.org/onlinepubs/9699919799/ that is not conformant

Upon assignment, file timestamps are immediately converted to the resolution of the file system by truncation (i.e., the recorded time can be older than the actual time). For example, if the file system resolution is 1 microsecond, then a conforming stat() must always return an st_mtim.tv_nsec that is a multiple of 1000. Some older implementations returned higher-resolution timestamps while the inode information was cached, and then spontaneously truncated the tv_nsec fields when they were stored to and retrieved from disk, but this behavior does not conform.

  • Section 11. Headers, <sys/stat.h>

Regardless of the POSIX standard, I think the developers expect that values in file_time_type::min()/max() range are safe to use and won't change on their own.

I have attached testing.c to test utimensat and stat directly and the results are On APFS: Will update file mtime to tv_sec = -9223372036854, tv_nsec = 0 File mtime now is tv_sec = -9223372036, tv_nsec = -854775808

On ext3: Will update file mtime to tv_sec = -9223372036854, tv_nsec = 0 File mtime now is tv_sec = -9223372036854, tv_nsec = 0

after rebooting File mtime now is tv_sec = 2217714954, tv_nsec = 0

Quuxplusone commented 6 years ago

Attached testing.c (1098 bytes, text/x-csrc): Code to test with low-level API.

Quuxplusone commented 6 years ago

This is a know issue, which I've spent a lot of time on and it's not clear how to fix it.

Quuxplusone commented 6 years ago

Also, the fact that the time changes after rebooting seems like an issue with the kernel/filesystem. Not libc++. If the time is successfully set, and can be retrieved then it seems like that should work. Regardless of reboot behavior.

Quuxplusone commented 6 years ago
For Darwin I tried

    typedef chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>  file_time_type;

But it caused static assertions during the build and I didn't check what is the
correct way of expressing this approach and if it actually fixes the issue.

I agree the difference between in-memory and on-disk timestamp representation
is the kernel/filesystem quirk. But I think it is useful for std::filesystem to
abstract these peculiarities instead of propagating them to its clients.
Quuxplusone commented 6 years ago

It's not even clear that it's possible to know the correct answer to these problems at build time. Since the FS could change at any point.

For that reason, I'm actually strongly in favor of using the largest possible file_time_type representation. It's much better for us to be able to handle all valid FS timestamps, as opposed to the opposite, where we can't represent all of the times represented by the FS.

Quuxplusone commented 6 years ago
(In reply to Volodymyr Sapsai from comment #0)
> Regardless of the POSIX standard, I think the developers expect that values
> in file_time_type::min()/max() range are safe to use and won't change on
> their own.

I don't think that's a reasonable expectation, and nothing in the C++ standard
supports that expectation.

The integer type used as the file_time_type representation might be able to
represent -2^63 microseconds but if the physical disks connected to the machine
don't allow that value to be written out and read back in (whether immediately
or after a reboot) then there's nothing the C++ library can do about it.

And if the kernel allows a value to be written, without reporting an error, but
the value isn't there after a reboot (or after it's removed from the disk
cache) then there's nothing the C++ library can do. Obviously the C++ library
isn't going to force a refresh of the entire disk cache just to ensure a file
mtime value is what the program expects.

All the library can do is try to write the value using futimens and report if
that succeeds or not.
Quuxplusone commented 6 years ago
(In reply to Eric Fiselier from comment #4)
> It's not even clear that it's possible to know the correct answer to these
> problems at build time. Since the FS could change at any point.

Right, and there's only one file_time_type for the whole C++ program, but it
gets used for all the filesystems connected to the machine, which could support
very different resolutions and very different min/max values.

So in general the range of valid values for the file_time_type can't be the
same as the range of value values for "the filesystem", because there could be
several different filesystems mounted on a machine.