Open tmds opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Author: | tmds |
---|---|
Assignees: | - |
Labels: | `area-System.IO` |
Milestone: | Future |
cc @carlossanlop
How do you differentiate a regular file from a hard link?
How do you differentiate a regular file from a hard link?
Once the hard link is created the resulting path is no different from the path it was created from. Both paths now have a strong reference to the file.
When you stat
, st_nlink
contains the nr of hard links. When there are multiple hard links, it will be higher than 1.
Paths to the same file have the same st_ino
.
For example:
Create a file:
touch file
Create a hard link:
ln file file2
Both of these are valid paths for the file. They both register as regular files. Notice the 2
in the output of ls
which is the nr of hard links.
$ ls -lah
total 0
drwxr-xr-x. 2 tmds tmds 80 Aug 23 13:36 .
drwxrwxrwt. 43 root root 1.4K Aug 23 13:22 ..
-rw-r--r--. 2 tmds tmds 0 Aug 23 13:36 file
-rw-r--r--. 2 tmds tmds 0 Aug 23 13:36 file2
They have the same inode nr:
$ stat file file2
File: file
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 0,41 Inode: 13530 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ tmds) Gid: ( 1000/ tmds)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2022-08-23 13:36:05.514551062 +0200
Modify: 2022-08-23 13:36:05.514551062 +0200
Change: 2022-08-23 13:36:08.184538995 +0200
Birth: 2022-08-23 13:36:05.514551062 +0200
File: file2
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 0,41 Inode: 13530 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ tmds) Gid: ( 1000/ tmds)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2022-08-23 13:36:05.514551062 +0200
Modify: 2022-08-23 13:36:05.514551062 +0200
Change: 2022-08-23 13:36:08.184538995 +0200
Birth: 2022-08-23 13:36:05.514551062 +0200
Paths to the same file have the same
st_ino
.
And the same st_dev
.
For Win32, there is FindFirstFileNameW, but I don't know whether it works with all remote file systems (SMB, NFS, WSL2), and the results might be difficult to use if symbolic links to directories are involved. There is also DWORD NumberOfLinks
in FILE_STANDARD_INFO, LARGE_INTEGER FileId
in FILE_ID_BOTH_DIR_INFO, and FILE_ID_128 FileId
in FILE_ID_INFO or FILE_ID_EXTD_DIR_INFO. Of these, FILE_ID_128 appears to be supported on Windows Server only. On Windows client operating systems, you'd have to use some other way to check whether the files are in the same volume, but I don't know how to do that efficiently. Perhaps the volume check doesn't have to be efficient if you do it only when DWORD NumberOfLinks
is greater than one and LARGE_INTEGER FileId
already matches.
Thanks @tmds. When I was implementing hardlinks, I didn't find information explaining st_nlink. I should've asked you directly.
Would you consider hardlinks a common enough scenario that we would have to fix this in 7? Or can this wait to be fixed in 8?
I've moved it to 8.0 as I don't believe that such scenarios should be common. Moreover, it sounds like we are going to need to perform some extra work to get it working. This might cause minor perf regression.
Yes, 8 is fine.
Currently hard links to the same file get duplicated in the archive. Instead, when additional hard links to the same file are encountered, they should be stored as hard links to the first entry.