borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.22k stars 743 forks source link

borg2: --files-cache=ctime,inode,size (windows) #7193

Open ThomasWaldmann opened 1 year ago

ThomasWaldmann commented 1 year ago

borg usually defaults to --files-cache=ctime,inode,size.

on linux, this is the best way to determine a file has not changed since we cached information about it into the files cache. this is because ctime is the "inode change time" and is only under kernel control, so nothing can be "faked" there by userspace (like rolling back the time to hide a change or so).

on windows, ctime attribute is the file creation time, does not change and is thus pretty useless for this purpose.

because of that, test_file_status_cs_cache_mode is currently skipped on windows, but this can't be the final solution.

borg rather should use --files-cache=mtime,size on windows (not sure about inode).

ThomasWaldmann commented 1 year ago

Usage of ctime could be a bug, needs checking.

ThomasWaldmann commented 1 year ago

@RayyanAnsari btw, the skipped test is rather special, but the problem on windows is likely much worse:

as borg by default uses ctime (== creation time on windows), the files cache would not detect modified file as modified if they have still same size and same inode (for the current --files-cache=ctime,size,inode default).

thus, i guess any backup except the first one might miss data (in previously existing, but updated files) on windows.

guess we must use mtime instead of ctime on windows by default or do we have any other options?

would it even make sense to reject ctime usage in --files-cache on windows as invalid on windows?

https://docs.python.org/3/library/os.html#os.stat_result

RayyanAnsari commented 1 year ago

We could either use mtime, or perhaps try to come up with a value for ctime using WinAPI functions like in cygwin and libuv

see also: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_file_network_open_information

ThomasWaldmann commented 1 year ago

Interesting that there is also a changetime. Would've been great to have that in stat_result.st_ctime and the creation time somewhere else...

Guess without that using mtime by default would be simplest?

RayyanAnsari commented 1 year ago

note to self: check https://github.com/python/cpython/pull/102149 and its future implications

ThomasWaldmann commented 1 year ago

@RayyanAnsari interesting. but they noticed they can't just change the semantics of ctime even if that would be prettier / more consistent in the end. that's what one gets if one does it wrong in the beginning...

RayyanAnsari commented 1 year ago

commit description:

This deprecates st_ctime fields on Windows, with the intent to change them to contain the correct value in 3.14. For now, they should keep returning the creation time as they always have.

python 3.14 seems quite a while away... We could implement ctime ourselves and get rid of it when that comes - shouldn't be too hard with ctypes.

ThomasWaldmann commented 1 year ago

It also depends on the filesystem. NTFS can do it, other filesystems?

Would be cool if we could implement that in a non-messy way (at just one place, e.g. in a own os_stat or borg_stat wrapper around os.stat). I thought about just patching it into stat_result, but that is read-only and also its implementation and properties varies by OS.

ThomasWaldmann commented 1 month ago

Related: I just changed the files cache, so it stores both ctime and mtime.