Closed carlreinke closed 3 years ago
Requesting information about that link's target should always require opt-in.
We have a different opinion about this.
it seems to me that .GetTarget(bool returnFinalTarget = false) is a better way to address this use case, given that you then get the choice between the immediate and the ultimate target path.
If you don't care about the target path, that just adds noise. Consider how the following would look using GetTarget
.
FileInfo file = new FileInfo("/tmp/my-file", followLinks: true);
DateTime lastWriteTime = file.LastWriteTimeUtc;
while (true)
{
Thread.Sleep(1000);
file.Refresh();
DateTime writeTime = file.LastWriteTimeUtc;
if (writeTime != lastWriteTime)
{
Console.WriteLine("The file has changed");
lastWriteTime = writeTime;
}
}
I think the common case is that they're not interested in anything other than the name, and that it's just a value returned from Directory.EnumerateFiles (or DirectoryInfo.EnumerateFiles, whatever).
When someone does new FileInfo(path)
themselves, I don't think there's a good answer to what they might want... if it's the file size, then they probably want the contents. If it's Exists that's a tricky question for symlinks (the link itself exists, but the destination might not, which were you asking about?).
I'm totally cool with a constructor parameter allowing a link (or all links, or N links, whatever) to be followed, so that people who feel strongly about expressions-vs-statements can accomplish their goals, but it just wouldn't be the default. (Plus, I would consider it too weird of a breaking change, particularly while there's still the netstandard2.0 concept floating around)
If it's Exists that's a tricky question for symlinks (the link itself exists, but the destination might not, which were you asking about?).
It is already in the Exists
. Ex.:
https://github.com/dotnet/runtime/blob/b0c7f044dacd417b2fcc2699bd396f1d949baa33/src/libraries/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs#L42-L50
I'm totally cool with a constructor parameter allowing a link
My concern was duplication of functionality, but I can now see that providing such constructors would be a helpful - and better-performing - alternative to a subsequent .Get*Target()
method call.
Judging by the example use case, only providing a bool
of returnFinalTarget
- i.e. optional resolution to the ultimate target - may be sufficient.
Also: What should happen if the target doesn't exist? Exception?
Given the above and @iSazonov's findings, conclusively determining the existence of a link target would require one of the following:
new FileInfo("foo", returnFinalTarget: true).Exists
or (especially with a given FileSystemInfo
instance):
new FileInfo("foo").GetTarget(returnFinalTarget: true)?.Exists ?? false
Also note that the current .Exists
behavior on symlinks with missing targets may be surprising on Unix: Symlinks aren't typed on Unix, and in the absence of a target even a symlink that originally targeted a directory won't report as a directory. In other words: for a symlink with a missing target on Unix, new DirectoryInfo("broken-symlink").Exists
is always false
, despite the link's own existence. Conversely, new FileInfo("broken-symlink-meant-to-target-a-dir").Exists
is true
as well.
new DirectoryInfo("broken-symlink").Exists
That code is buggy even on Windows because it doesn't handle SAN drives correctly.
@bartonjs @mklement0
API proposal for the FileInfo(string fileName, bool followLink)
here https://github.com/dotnet/runtime/issues/52908.
This is a long post, but please bear with me; I know that the proposal above has already been termed final, but I think the ongoing discussion shows that refinement may be necessary, especially so as to guarantee useful platform-neutral abstractions that leave the door open for future, incremental enhancements; tip of the hat to @iSazonov for his documentation links re NTFS reparse points, which lifted some conceptual fog for me. Apologies if I'm missing previously discussed things.
I will use the term (file-system) link in this discussion to refer to the abstraction of interest: one user-facing file-system entry pointing to another, in a manner directly supported by the file-system (albeit possibly via third-party filters), resulting in transparent redirection when an entry's data is requested (a file's content, a directory's entries).
As such, a symbolic link IS an instance of a link, and, on Windows only, so are AppX reparse points (for Microsoft Store executables) and the - seemingly mostly legacy - junctions and mounted folders (volume mount points).
A hardlink, despite its name, is NOT an instance of a link in this sense. Hardlinks point to the same data (inode, in Unix terms), but do not point to each other.
.Target
property, but this was removed in PowerShell (Core) for performance reasons (see https://github.com/PowerShell/PowerShell/issues/15139). While finding a hardlink's peers may indeed be of interest - which is feasible only on Windows - surfacing this via a .Target
property to me was always conceptually problematic, given that these paths aren't each other's target.fsutil.exe hardlink
; e.g., fsutil hardlink list C:\Windows\explorer.exe
)While (at least initially) limiting the creation of links to symbolic links only, querying probably can and should be supported for all link types, which means:
On creation, baking the type of link being created into the method name - e.g. CreateSymbolicLink()
or CreateJunction()
(as already proposed in the "Future expansions to keep in mind" part of the proposal, which I had missed) - implicitly solves the open-ended enum
problem:
CreateSymbolicLink()
is all we'll ever need:
On querying, the all-types nature of the functionality should be reflected in the member names, which means:
GetLinkTargetInfo()
rather than GetSymbolicLinkTargetInfo()
GetTargetInfo()
Let's look at link-related use cases, how the existing proposal and related ones apply to them, and what may be missing:
Use cases:
I only care about a link's (ultimate, non-link) target:
I'm only interested in a link's (ultimate) target's data, not its metadata, i.e a file target's content and a directory target's entries:
I'm (also) interested in a link target's metadata, such as size, timestamps, ...
@tmds' proposal in #52908 would cover that - you simply pass a flag to a FileInfo
/ DirectoryInfo
constructor indicating that you want an object representing the (ultimate, non-link) target (in full or partially - see the discussion in the linked issue) rather than one representing the link.
Depending on the implementation to be decided on, this will also give you the name of the target and its path, though whether that path is canonical is TBD.
I want to know whether an item with a given path ultimately exists, i.e. in the case of a link I want to know whether the target exists:
The current Exists
property doesn't tell you that and has conceptual problems.
A possible solution - albeit technically a breaking change - is to add a TargetExists
property that for link FileSystemInfo
instances checks the target's existence and for non-link instances acts like Exists
; Exists
should then in the case of links only report the link's existence, irrespective of the existence of its target.
Complementarily the static Directory.Exists()
and File.Exists()
methods would have to support a new, optional bool checkTarget
parameter (or similar).
I want to ignore directory links during recursive enumeration of directory entries:
I want to work with links explicitly:
I want to create a link:
.CreateSymbolicLink()
(as in the proposal, albeit sans "As") allows me to create instances of the most flexible type of link supported across platforms, a symbolic link.
I want to obtain a link's target:
.Get[Link]TargetInfo(bool returnFinalTarget = false)
returns a link's immediate / ultimate target as a FileSystemInfo
instance, irrespective of the specific link type.
Again, it may make sense to name the method just .GetTargetInfo()
and have it exhibit desired-state logic: a non-link item would simply return itself.
returnFinalTarget = true
has two possible interpretations (though platform-native APIs may imply one or the other):
Find the ultimate, non-link target, irrespective of whether its path (via .FullName
) may contain link components.
Find the ultimate, non-link target and its canonical path, i.e. the target's full path with none of the components being links themselves.
I want to query a link's definition:
While querying the specific type of link may not be necessary, I may want to know whether a given path refers to a link in the abstract:
On Windows, .Attributes.HasFlag(FileAttributes.ReparsePoint)
is not sufficient for this, because while every link is a reparse point, the inverse is not true, as the help topic @iSazonov has pointed to shows. More specifically, links are a subset of reparse points called name surrogates, as reflected in a flag in the so-called reparse tag - see IsReparseTagNameSurrogate()
.
So, perhaps a Boolean .IsLink
property or method is called for, after all.
I want to know a link's target-path definition:
That is, irrespective of the existence of the target, I may want to examine the exact path (string) as stored in the link, notably to determine whether it is absolute or relative.
Letting, the .ToString()
value of a FileSystemInfo
instance returned by a .Get[Link]TargetInfo()
(with returnFinalTarget = false
(the default) only) reflect this information, as proposed by @ericstj, is an option.
While this has the advantage of avoiding additional new members, it is a bit obscure, so the alternative is to provide dedicated .GetLinkTargetPath()
methods.
I want to know whether an item with a given path ultimately exists, i.e. in the case of a link I want to know whether the target exists
It seems rarely scenario and we could use a code like IsSymbolicLink() && GetTarget() is not null
without complicating the API.
@iSazonov : It's more than half the use cases for FileInfo.Exists.
I want to know whether an item with a given path ultimately exists, i.e. in the case of a link I want to know whether the target exists: The current Exists property doesn't tell you
https://github.com/dotnet/runtime/issues/52908 Exists
with tells you this.
and has conceptual problems.
The problem mentioned is that new DirectoryInfo("broken-link").Exists
returns false
. I don't have a strong opinion about whether that should change or not.
Note that the APIs from this proposal will allow you to check IsSymbolicLink
in addition to Exists
, which will return true
for a broken link.
52908 Exists with tells you this.
It would tell you for a FileSystemInfo
instance you construct, but not for a preexisting one.
Note that IsSymbolicLink
is, as of this writing, no longer part of this proposal - though I do agree that it should be, as IsLink
, as argued above.
Yes, checking this member could be used to infer whether a given FileSystemInfo
instance is a non-existent path or a broken symlink.
However, just like the new FileSystemInfo("<path>", followLink: true)
constructor overload you propose in #52908 is a convenient (and faster) shortcut to new FileSystemInfo("<path>").Get[Link]Target(returnWithFinalTarget: true)
(at least in the only way that proposal makes sense to me - see discussion there), new FileSystemInfo("<path>").TargetExists
would be a shortcut to telling you whether [a link's] target exists, whereas .Exists
would then be free to reflect only the input path's existence, whether it is a link or not - which is conceptually much clearer.
@Jozkee and I discussed the additional comments we received after the main APIs were approved. The new requests/concerns make sense, so we would like to propose these additional APIs to address them:
namespace System.IO
{
public abstract partial class FileSystemInfo
{
public void CreateAsSymbolicLink(string pathToTarget); // Already approved
- public FileSystemInfo? GetSymbolicLinkTarget(bool returnFinalTarget = false); // Already approved
+ public FileSystemInfo? GetLinkTarget(bool returnFinalTarget = false); // Rename
+ public string GetLinkTargetPath();
}
public static class File
{
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved
- public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+ public static FileSystemInfo? GetLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
}
public static class Directory
{
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved
- public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+ public static FileSystemInfo? GetLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
}
}
This is so that the method can be reused to return all the different types of Windows reparse points (junctions, AppExecLinks, etc.), not just symbolic links.
Returns the path to which the current link points, whether it's relative or absolute. This prevents having to rely on ToString()
which is an obscure, unreliable workaround for when the link path is relative (it does not always return the relative path as initially expected).
Proposed as a method, instead of a property, because a property implies having a cached state, while a method implies collecting the latest info each time it's invoked. This is to resolve the concern where multiple p/invokes to a same path would result in retrieving information from a fast-changing filesystem.
A community member requested we reconsider changing the name of FileSystemInfo.CreateAsSymbolicLink
to FileSystemInfo.CreateSymbolicLink
. We already had a long discussion about this name in the previous video, in which we justified the name, but we thought it was important to mention this request.
We can also consider adding the static methods File.GetLinkTargetPath(string linkPath)
and Directory.GetLinkTargetPath(string linkPath)
now, or wait for a community member to request them.
We considered adding ReparseTag in order to offer the ability to check which type of link is being handled, but it seems that it is not required to satisfy this proposal.
public enum ReparseTag
{
None = -1, // Default, indicates the file is not a Reparse Point (zero is already reserved)
MountPoint = 0xA0000003, // Maps to IO_REPARSE_TAG_MOUNT_POINT, describes a Junction
Symlink = 0xA000000C, // Maps to IO_REPARSE_TAG_SYMLINK, the only value used by Unix
AppExecLink = 0x8000001B, // Maps to IO_REPARSE_TAG_APPEXECLINK
}
public abstract partial class FileSystemInfo {
public ReparseTag ReparseTag { get; } }
Having a bool IsLink
property was considered but we concluded that the following expression should achieve the same: info.GetLinkTarget()?.Exists ?? false
.
@terrajobst @bartonjs can we please revisit this?
@everyone-interested-in-the-thread, can you please review the API proposal above and let us know if you have additional feedback before we present it?
public enum ReparseTag
Will this block using of custom values like IO_REPARSE_TAG_NFS? Maybe replace the enum with public consts?
public void CreateAsSymbolicLink(string pathToTarget); // Already approved
If we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)
? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget)
so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink)
.
Thinking so we could rename public ReparseTag ReparseTag { get; } // Name can be discussed
to public LinkType LinkType { get; }
From my previous comment, we could want to check binary flags in LinkType for reparse points, In particular, there is 0x20000000 name surrogate flag. (It is checked in PowerShell IsReparsePointLikeSymlink() method.)
@carlossanlop I provide input with my Linux hat on. You should take from it anything you like. I don't know enough about Windows to say what the overall API should be.
FileSystemInfo
performs operations on a path
. It doesn't track the underlying file system entry that it represents. If we're making multiple calls against the path, we may not be retrieving information from the same file system entry. I consider this undesirable
The following properties are based on the stat
/lstat
system call:
public FileAttributes Attributes { get { throw null; } set { } }
public DateTime CreationTime { get { throw null; } set { } }
public DateTime CreationTimeUtc { get { throw null; } set { } }
public abstract bool Exists { get; }
public DateTime LastAccessTime { get { throw null; } set { } }
public DateTime LastAccessTimeUtc { get { throw null; } set { } }
public DateTime LastWriteTime { get { throw null; } set { } }
public DateTime LastWriteTimeUtc { get { throw null; } set { } }
The property that is proposed is based on the readlink
system call or the realpath
function:
public string LinkTargetPath { get; }
Because these are separate calls, they may have information that is not about the same entry. I want to mention it explicitly, this may something you are ok with.
If I were proposing something (for Linux), it would be:
class FileSystemInfo
{
public bool IsLink { get; } // always false when ctor `followLinks` is `true`, S_ISLNK when `followsLinks` is `false`
public string GetTargetPath(bool followLinks = false); // returns followLinks ? realpath(_path) : readlink(_path);
}
class FileInfo
{
// https://github.com/dotnet/runtime/issues/52908
public FileInfo(string fileName, bool followLinks) { } // bool switches between stat and lstat
}
class DirectoryInfo
{
// https://github.com/dotnet/runtime/issues/52908
public DirectoryInfo(string path, bool followLinks) { } // bool switches between stat and lstat
}
The API that is proposed to create links is:
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);
I don't know why it is duplicated it over File
and Directory
. On Linux there is no distinction, so just having it on File
is enough.
I'd make this return void
. I think in most cases, a user will not care about having this instance. If he wants it, he can just create it.
Will this block using of custom values like IO_REPARSE_TAG_NFS?
@iSazonov No, you can still expect values that are not listed in the enum.
Maybe replace the enum with public consts?
As per framework design guidelines, we should favor using an enum instead of static constants.
if we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget) so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink).
Both options are feasible, IMO. I prefer adding methods rather than enum values for this particular case. Regarding having LinkType instead of ReparseTag, I think it scales better the latter given that links are a subset of reparse points.
FileSystemInfo performs operations on a path. It doesn't track the underlying file system entry that it represents. If we're making multiple calls against the path, we may not be retrieving information from the same file system entry. I consider this undesirable
@tmds the proposed API should cover for that: On Windows you can use ReparseTag in order to know that you are working with a link. On Unix yuo can use FileAttributes to query if it contains reparse point in order to know that you are working on a symlink. Once identified as link, you can use GetLinkTarget and work with the target.
The property that is proposed is based on the readlink system call or the realpath function
Not sure if this is a question, if so, on Unix, LinkTargetPath will be using readlink
or something alike that lets you find out if the link target path is relative or absolute as some folks expressed interest in this.
I don't know why it is duplicated it over File and Directory. On Linux there is no distinction, so just having it on File is enough.
First, there is distinction on Windows; second, as of today, a symlink pointing to a directory is created as a DirectoryInfo by Directory.Enumerate*
and you can use the DirectoryInfo ctor for a symlink pointing to a directory.
@tmds the proposed API should cover for that:
What I mean is: LinkTargetPath
and ReparseTag
can reflect information about a different file system entry than the other properties (like LastWriteTime
). That's because retrieving all these values requires multiple calls against the path. Between the calls, the underlying entry can change.
Not sure if this is a question
This is what the implementation would do on Linux.
First, there is distinction on Windows; second, as of today, a symlink pointing to a directory is created as a DirectoryInfo by Directory.Enumerate* and you can use the DirectoryInfo ctor for a symlink pointing to a directory.
ok. Windows requires you to have separate methods on File
and Directory
.
What I mean is: LinkTargetPath and ReparseTag can reflect information about a different file system entry than the other properties (like LastWriteTime). That's because retrieving all these values requires multiple calls against the path. Between the calls, the underlying entry can change.
@tmds I see what you mean. Do you know if there is a better way to get this information without falling into that problem?
Do you know if there is a better way to get this information without falling into that problem?
The API in https://github.com/dotnet/runtime/issues/24271#issuecomment-848512625 doesn't have this problem on Linux.
Looking at GetFileAttributesEx
docs, I think it works for Windows too.
The API in https://github.com/dotnet/runtime/issues/24271#issuecomment-848512625 doesn't have this problem on Linux.
@tmds how is that? How are you avoiding doing two calls using the file path? i.e. using your API, if you want to get the LastWriteTime
of the target, you still need to use two syscalls taking a path, or am I missing something?
var di = new DirectoryInfo("/path-to-link", followLinks: true); // calls readlink(path).
Console.WriteLine(di.LastWriteTime); // calls lstat(path).
@Jozkee: If new DirectoryInfo("/path-to-link", followLinks: true); called readlink the implementation is buggy. That should call stat(path)
(not lstat(path)
).
@jhudsoncedaron ah, great point.
Yes, that is what I mean with the comment: // bool switches between stat and lstat
.
How are you avoiding doing two calls using the file path?
class FileSystemInfo
{
public bool IsLink { get; }
public string GetTargetPath(bool followLinks = false);
}
IsLink
can be determined together with the other properties. The link information is retrieved using a method.
Between the calls, the underlying entry can change.
The only way to avoid this is to use a transactional file system. I guess no file API in Runtime eliminates this problem.
@iSazonov I'd like to expand on @Jozkee's answer.
If we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget) so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink).
I would prefer to keep the APIs separate for every different kind of reparse point. We need to keep in mind that having a reparse point means there's additional metadata we need to add to the file, and there can be cases where the user needs to manually provide this extra information. We can keep adding APIs as needed.
The symbolic links feature is important enough to deserve its own separate creation APIs.
Thinking so we could rename public ReparseTag ReparseTag { get; } // Name can be discussed to public LinkType LinkType { get; }
We thought about this, but Reparse Points are a concept unique to Windows. On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint)
. On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points.
First things first: Thanks for continuing the discussion.
On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying
fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint)
However, that is undoubtedly awkward, both in its verbosity and obscurity, given that the term reparse point is Windows-specific.
On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points.
That is awkward too, for the reason that determining whether a path is some kind of link is then quite cumbersome, and would require manual inspection of .ReparseTag
for the named-surrogate bit.
The APIs we're discussing are focused on links, and, given that links on Windows are just a subset of reparse points, is it worth exposing details about / allow creation of non-link reparse points at all, or is the abstract attribute flag enough? My impression is that non-link reparse points can be left to specialty APIs.
If .ReparseTag
were renamed to .LinkType
and only ever reflected named-surrogate reparse points, i.e. links, then at least one could use .LinkType != LinkType.None
as a platform-neutral, abstract is-this-some-kind-of-link test (the enum values could still reflect the Windows reparse-tag values, only limited to named-surrogate values).
The only way to avoid this is to use a transactional file system. I guess no file API in Runtime eliminates this problem.
The FileSystemInfo classes don't have this problem now. And if we don't add properties that require additional system calls, we're not introducing it.
As I said, you may not care about it.
This is what a full API might look:
enum LinkType
{
SymbolicLink,
...
}
class LinkInfo
{
public string Path { get; }
public LinkType Type { get; }
}
class FileSystemInfo
{
public bool IsLink { get; }
public LinkInfo? ReadLink();
}
public static class File
{
public static void CreateSymbolicLink(string path, string pathToTarget);
}
public static class Directory
{
public static void CreateSymbolicLink(string path, string pathToTarget);
}
public static class Path
{
public static string ResolveLinks(string path);
}
@mklement0 :
On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying
fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint)
However, that is undoubtedly awkward, both in its verbosity and obscurity, given that the term reparse point is Windows-specific.
This is how we already do it, so it's not really obscure, even for Unix users. In fact, we explicitly documented FileAttributes.ReparsePoint
as cross platform to keep parity with Windows.
While I can agree it may be a bit awkward (regardless of the fact we have FileAttributes.ReparsePoint
documented), we didn't want to propose the IsLink
or LinkType
properties, because we wanted to keep this issue exclusive to symbolic links. If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.
On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points. That is awkward too, for the reason that determining whether a path is some kind of link is then quite cumbersome, and would require manual inspection of
.ReparseTag
for the named-surrogate bit.
I disagree because reparse tags are exclusive to Windows. Windows users will have to manually inspect them. In fact, there is a separate request to explicitly expose reparse tags: https://github.com/dotnet/runtime/issues/1908 . If we add LinkType
, which is a subset of reparse tags, it would be redundant to try to expose reparse tags later.
The APIs we're discussing are focused on links, and, given that links on Windows are just a subset of reparse points, is it worth exposing details about / allow creation of non-link reparse points at all, or is the abstract attribute flag enough? My impression is that non-link reparse points can be left to specialty APIs.
I agree we should separate the discussion for non-symlink reparse points.
@tmds :
In my opinion (and we also discussed this in the last proposal stream), we cannot have a new class LinkInfo
because:
DirectoryInfo
objects or FileInfo
objects already. A third kind of *Info
would be a breaking change since we would now have to return objects of this new type.FileInfos
or DirectoryInfos
. Would this mean we would have to also add enumeration APIs that return LinkInfos
? What about the enumeration APIs that return FileSystemInfos
? The user would have to cast links to the new type.@tmds :
The property that is proposed is based on the readlink system call or the realpath function:
public string LinkTargetPath { get; }
Because these are separate calls, they may have information that is not about the same entry. I want to mention it explicitly, this may something you are ok with. public string GetTargetPath(bool followLinks = false); // returns followLinks ? realpath(_path) : readlink(_path);
Your concern about the potential change of the underlying filesystem makes sense, so we would like to take your suggestion to make this a method instead of a property, but without the boolean parameter: it wouldn't make sense to retrieve the path of the final symlink, especially if it's relative.
Take this case, for example:
/dir
mylink2
mylink1
subdir/
mylink
subsubdir/
file.txt
Where the symlink chain looks like: mylink2 -> mylink1 -> mylink -> file.txt
If all the symlinks were relative:
mylink2 = mylink1 mylink1 = subdir/mylink mylink = subsubdir/file.txt
@tmds what should GetLinkTargetPath(true)
return?
This is how we already do it, so it's not really obscure, even for Unix users.
Understood, but on a philosophical note: A well-documented obscurity will still be an obscurity to new users.
Documentation is no substitute for descriptive, platform-neutral terms.
And, as stated, on Windows checking for ReparsePoint
isn't sufficient, because not all reparse points are links.
So there is currently no platform-neutral abstraction,.
we didn't want to propose the IsLink or LinkType properties, because we wanted to keep this issue exclusive to symbolic links. If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.
My (previously expressed) hope was that while 6.0 may be limited to symbolic links in terms of creation, it will already support all link types in terms of querying (obtaining their target).
Understood re wanting to expose even non-link reparse-tag values (though you could argue that such a property only belongs in the FileSystemEntry
type).
In that case, an abstract IsLink
property sounds like the best solution, which on Windows can based on whether an entry is a reparse point that also has the name-surrogate flag set.
Also note that the abstract term Link is already in your latest proposal as part of method names such as GetLinkTarget
.
we cannot have a new class LinkInfo because:
I think you assume based on the name LinkInfo
is something that derives from FileSystemInfo
which is not the case.
but without the boolean parameter:
I had removed it here also: https://github.com/dotnet/runtime/issues/24271#issuecomment-849065336.
what should GetLinkTargetPath(true) return?
and replaced the true
version with another method (Path.ResolveLinks
) which would return /dir/subdir/subsubdir/file.txt
.
In that case, an abstract IsLink property sounds like the best solution
@mklement0 could info.GetLinkTarget()?.Exists ?? false
suffice in order to know whether a given file system entry is a link? If so I think we can drop ReparseTag
from this proposal and also avoid adding IsLink
.
This will be complemented by allowing that all link types, (symlinks, junctions and appExecLinks on Windows && symlinks on Unix) get the ability to query its target.
I think we can drop ReparseTag from this proposal and also avoid adding IsLink .
I think IsLink
adds a certain visibility which is desired.
If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.
It may not be breaking: the same attribute is used for both:
FILE_ATTRIBUTE_REPARSE_POINT1024 (0x400) | A file or directory that has an associated reparse point, or a file that is a symbolic link.
I would prefer to keep the APIs separate for every different kind of reparse point. We need to keep in mind that having a reparse point means there's additional metadata we need to add to the file, and there can be cases where the user needs to manually provide this extra information. We can keep adding APIs as needed.
Yes, for common cases we need the extra information but we say here about generic behavior. Mainly I mean enumeration and removal. (For creation we have to use extra information and do p/invokes like PowerShell does for junctions/mount points - WinCreateJunction() method in PowerShell) I don't know whether .Net API does the enumeration/removal as "correctly" as PowerShell for OneDrive, junctions/mount points and AppX links. In the case there should be internal API to check a kind of reparse points (see IsReparsePointLikeSymlink() method in PowerShell). And the API could become public. But if we get this working in .Net the same as in PowerShell then we have no need in the new public one. Also we need a target for IO_REPARSE_TAG_APPEXECLINK and IO_REPARSE_TAG_MOUNT_POINT. You could look WinInternalGetTarget() method in PowerShell. It gets targets for:
Can we support this in proposed GetLinkTarget()? (Is there other reparse points with behavior like symlink so that we add its in the list too?).
Summary of PowerShell:
Create operation. No doubts separate API should be for every kind of reparse point. In the proposal only CreateAsSymbolicLink() should be. Makes no sense to add creation methods for other reparse points since there are no analogs on Unix. PowerShell will continue to use p/invokes as now.
Remove. Currently PowerShell uses IsReparsePointLikeSymlink() to explicitly disable recursion and treat a file system object as an end file if it is symlink (mount point/junction or appx). I am not sure we can avoid this explicit check since there a generic code works. (Although if we already do FileSystem provider specific checking in the generic code why don't we call the FileSystem provider specific delete method immediately ignoring the generic code?) I think it could be implemented in .Net - yes? - and this excludes needs in public exposing reparse tags.
Enumerate PowerShell implements generic "manual" enumeration. It writes an error for every entity it can not access and continue on error. For FileSystem provider PowerShell don't follow symlinks by default. If user sets FollowSymbolicLink switch PowerShell will follow to symlinks in enumeration and track cycles. I am trying to migrate to new .Net enumeration API and there is still a check with IsReparsePointLikeSymlink() for performance reasons. Again, I'm not sure we can still get rid of IsReparsePointLikeSymlink() there since we "manually" manage recursion and we need to know if FileSystemEntry is like symbolic link (jucntion, appx). Again, I think it could be implemented in .Net - yes? - and this excludes needs in public exposing reparse tags.
namespace System.IO
{
public abstract partial class FileSystemInfo
{
public void CreateAsSymbolicLink(string pathToTarget); // Already approved
- public FileSystemInfo? GetSymbolicLinkTarget(bool returnFinalTarget = false); // Already approved
+ public FileSystemInfo? ResolveLinkTarget(bool returnFinalTarget = false); // Rename
+ public string? LinkTarget { get; }
}
public static class File
{
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved
- public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+ public static FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
}
public static class Directory
{
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved
- public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+ public static FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
}
}
@Jozkee
could info.GetLinkTarget()?.Exists ?? false suffice
info.GetLinkTarget() is not null
(or, now: info.ResolveLinkTarget() is not null
) should suffice as an abstract test for whether a given item is a link, but I'm with @tmds that info.IsLink
is (much) preferable, on the grounds of semantic directness: it is a direct expression of a common use case; plus, it can be implemented more efficiently.
However, if we take a step back (again, unfortunately):
Revisiting @tmds's suggestion in #52908 - hybrid content of FileSystemInfo
instances combining name and path of the link with all remaining properties taken from the target (though I would let .Delete()
only ever act on the link) - I think it may be the way to go by default, not even with an opt-in (which @tmds had also mentioned as desirable):
The motivation is my discovery of an existing, ugly asymmetry: something like new FileInfo("link").LastWriteTime
reports the link's (usually irrelevant) last-write timestamp, whereas new FileInfo("link").LastWriteTime = DateTime.Now
sets the (ultimate) target's property.
Given that a link's timestamps, Unix mode, ... itself is usually irrelevant, defaulting to both reporting and manipulating the target's properties is much more sensible. (We'd be giving up a 1:1 correspondence between file-system entries and FileSystemInfo
instances, but as the example above shows, this correspondence is already being violated, in obscure fashion - switching to a helpful virtual representation seems much preferable - and #38824 even provides an indication that this may always have been the intent)
Therefore, I propose the following:
Make the DirectoryInfo
and FileInfo
constructors construct hybrid instances by (invariable) default, if possible, that is, if the path refers to link whose target exists.
.Exists
(see below) and use default values (as currently already happens with a non-existent path) and throw on attempts to assign to properties (as already happens).Let a new .IsLink
property reflect whether the path at hand refers to a link (which implies whether the .Delete()
method will operate on the link only and serves to disambiguate .Exists
behavior - see below).
For link paths, let the .Exists
property reflect only whether the target exists, which implies:
.Exists
reports the (type-appropriate) existence of the path itself..Exists
reports whether the target exists, and checking .IsLink
in addition can be used to disambiguate between a non-existent path and a broken link.Make .CreateAsSymbolicLink()
return a FileSystemEntry
instance, not FileSystemInfo
(see below).
The above should eliminate even having to care whether a given path is a link or not in the majority of cases.
LinkTarget
and .ResolveLinkTarget()
methods could then be moved to the FileSystemEntry
type (see below).
As for explicit link handling with respect to querying:
Use only FileSystemEntry
instances to represent links as themselves, as a direct representation of the underlying file-system entry, without exhibiting the hybrid behavior discussed above.
FileSystemEntry
is a single type that isn't forced into the file/directory dichotomy, that also solves the problem of type-agnostic broken symlinks on Unix.This requires enhancing FileSystemEntry
as follows (note: I haven't considered performance implications or implementation challenges - I'm speaking conceptually):
Add a public constructor, so that user can directly pass a link path of interest.
The instance returned can then be used to modify the properties of the link itself, if really necessary - which in turns requires making properties such as .LastWriteTimeUtc
read-write and perhaps also necessitates a .Refresh()
method.
Optionally add overloads that ask for the immediate / final target to be returned.
Add .IsLink
and .Exists
properties with the same behavior as proposed for FileSystemInfo
above - in order to implement #52666 - for not including the targets of directory links in the enumeration - this information is required anyway.
Add LinkTarget
and .ResolveLinkTarget()
- although I think it would be perfectly fine to have LinkTarget
be a method, as previously discussed.
Additionally, adding support for canonical target-path resolution in System.IO.Path
based on path strings still makes sense to me. Rather than introduce a new method, as @tmds has suggested, I think adding an overload to GetFullPath
should suffice:
public static string GetFullPath(string path, bool resolveToFinalTarget = false);
Make .CreateAsSymbolicLink() return a FileSystemEntry instance, not FileSystemInfo (see below).
@mklement0 It is ref struct
:-(
Why do CreateAsSymbolicLink
methods return something? I don't expect it to be common to perform more operations once the link is created.
Although the hybrid/composite approach may look attractive, it seem is a breaking change and it would be a completely unacceptable. From a practical point of view it is better to have an API that does exactly what is expected and I wouldn't want to confuse the behavior even more. If we want to process exactly links, it is better to have a separate API for that - IO.SymbolicLink(Info) class in addition to IO.File(Info/IO.Directory(Info) so that expose FileSystemEntry for the link.
The motivation is my discovery of an existing, ugly asymmetry: ...
So on .NET 5, reading the times from a link give you the times on the link itself, but when you set them, they send up setting the targets? Yikes!
It would be ideal to be (effectively) all hybrid behavior or none at all. (By "effectively" I mean: with the caveats like Delete affecting the link itself, which would be the expected behavior in a hardlink scenario, ignoring that most platforms disallow directory hardlinks). But I don't think we can go in the direction of all hybrid by default, given that that is a pretty significant breaking change. An opt-in option certainly would be possible.
Going in the direction of non-hybrid by default is also a breaking change (from the timestamp setters). I would not be very concerned about fixing the timestamp setters. It is hard justify the current behavior as anything but a bug. While there may be a few users that would be adversely affected by fixing it, I would imagine just as many if not more users would effectively getting a minor bug fixed. (There seems to be a PR in progress for this: #52639).
The whole situation here is complicated, since in many simple cases you really do only care about the the target of the symlink, and a fair few APIs and syscalls do a decent job of allowing you to ignore that something is a symlink. But in other cases it is critical that you know a path is a symlink. There have been security bugs in applications on multiple occasions resulting from confusing a symlink with a normal file or normal directory. This makes me want to lean in the direction of not trying to do additional clever things with symlinks by default beyond the normal "by default opening a symlink opens the target" (in both the file contents, and directory traversal sense of "opening").
@tmds:
If needing to modify the properties of a newly created symlink would be common, returning an object representing it would be justified. However, Since symlink properties - other than the already-specified target path - are usually irrelevant, I agree that there's no good reason to return such an object.
@KevinCathcart, @iSazonov:
It would be ideal to be (effectively) all hybrid behavior or none at all. But I don't think we can go in the direction of all hybrid by default
Going all-hybrid by default:
enables sensible default behavior for what I expect to be the vast majority of use cases: treat a link as a transparent redirection, and operate on the target - as already happens with respect to content operations (reading a file target's content, enumerating a directory target's entries) and half-happens with properties, namely on setting only (the bug discussed).
while definitely a breaking change, to me it falls into - to borrow PowerShell's classification - bucket 3, i.e. a change that is unlikely to impact existing code, because the current behavior is unhelpful.
The only code I can see being impacted is code that relies on querying timestamps and attributes, but given that a link's own properties are usually irrelevant, that doesn't strike me as likely.
(Going all-hybrid would imply nixing #52639 and implementing the opposite logic: both getting and setting properties should then operate on the target.)
The only unfortunate aspect of a hybrid representation is that in order to avoid a truly serious breaking change, the .Attributes
value will have to pretend that ReparsePoint
is an attribute of the target, so as not to break existing code that uses this attribute toinfer that a FileSystemInfo
instance is a link (unreliably, on Windows, given that not all reparse points are links). (A kingdom for a separate .IsLink
property! If a ReparseTag
property is introduced, it would be free to reflect only the target's value, and could serve to disambiguate.)
But in other cases it is critical that you know a path is a symlink.
And for that an .IsLink
property is sufficient - plus the ability to obtain a non-hybrid representation of a link itself, if really needed - see below.
@iSazonov:
It is
ref struct
:-(
Excellent point, I'm sorry I missed that; what drew me to FileSystemEntry
was its type-abstract nature, as described in the help topic (emphasis added):
Provides a lower level view of
FileSystemInfo
[...]
Since FileSystemEntry
being a ref struct
rules out its use for on-demand construction, let me propose the following:
Derive a new class, LinkInfo
(or FileSystemLinkInfo
) from FileSystemInfo
, which is to be a representation of the properties of a file-system link itself, i.e. the non-hybrid counterpart to FileInfo
/ DirectoryInfo
allowing modification of the link's properties itself:
This class should only be constructable with a path that (a) exists and (b) is a file-system link.
.Exists
would reflect the target's existence (which would then apply to all FileSystemInfo
-derived types, as proposed).
The class needs an .IsDirectoryLink
property or method that signals whether the target is a directory, if known (always known on Windows, on Unix only if the (ultimate) target exists).
Situationally, the new ResolveLinkTarget()
methods would then return LinkInfo
instances, namely if the immediate target (returnFinalTarget: false
) happens to be another link.
Conceivably, it is then sufficient to offer a .LinkTarget
method/property (to get the target path string as defined) on LinkInfo
only, not on all FileSystemInfo
-derived types.
As for FileSystemEntry
:
To avoid a breaking change, introduce a new .ToFileSystemInfo(bool asLink = false)
overload, which if requested returns entries that are links as LinkInfo
instances.
An .IsLink
property would still be helpful (but no other link support is necessary).
- Derive a new class,
LinkInfo
(orFileSystemLinkInfo
) fromFileSystemInfo
As MSFT team members said it is better to have separate types because there are a lot of reparse points on Windows (and new ones may appear on Unix) and I agreed with this. So we could introduce SymbolicLink(Info). In this case, this class should not have any hybrid behavior and work with the entity itself - change name, target, time, attributes. etc. Then some magical behavior of FileInfo/DirectoryInfo/FileSystemInfo would be more understandable (I still think we need IsSymbolicLink() there). For example, FileSystemInfo.GetTarget() could return SymbolicLinkInfo.
Now that the APIs got approved, the discussion is now about additional/incremental improvements for which we already have issues open. So if you don't mind, and for clarity, let's continue the conversation in those issues:
How to get/set attributes of a symlink file vs target file (transparent handling) and any potential new breaking changes: Let's move the discussion to https://github.com/dotnet/runtime/issues/52908
Reporting the type of link, whether with IsLink
vs ReparseTag
/LinkType
enum + property: Let's move the discussion to https://github.com/dotnet/runtime/issues/1908
public FileSystemInfo? ResolveLinkTarget(bool returnFinalTarget = false); // Rename
Since you've added string? LinkTarget
, I think the appropriate default for returnFinalTarget
is true
.
The argument may even be dropped imo.
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);
As mentioned earlier, the user won't care about this return value. Consider making this return void
.
string? LinkTarget { get; }
I still wish this were a method. Not only for consistency with the other properties, but also because now Refresh
will probably make an additional syscall for a property the user may not care about.
public bool IsLink { get; }
I don't understand why this is left out.
public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);
As mentioned earlier, the user won't care about this return value. Consider making this return
void
.
It is the same as existing static API like Directory.CreateDirectory(). It makes no sense to use another pattern.
public bool IsLink { get; }
I don't understand why this is left out.
It is clear as this could be for symbolic links but we need to take into account Windows reparse point tags and in the case it is not yet clear how to present such API.
Regarding an IsLink
property (which I also think is called for) in relation to the proposed ReparseTag
property: Let's continue the discussion at https://github.com/dotnet/runtime/issues/1908#issuecomment-852117517
For the sake of completeness: another link-related proposal, in the context of file-system-item enumeration (opt-in/out of recursing into directory links), is #52666
So on .NET 5, reading the times from a link give you the times on the link itself, but when you set them, they send up setting the targets? Yikes!
I agree, yikes. The native APIs have the same gotchas.
Yikes for sure, but, as @KevinCathcart has also noted, pending PRs #52639 and #52639 (both by @hamarb123) are planning to fix that.
In other words: This makes FileSystemInfo
instances true representations of the underlying link file-system entries, as they mostly already are (aside from the asymmetry that the PRs will fix, there are two - defensible - exceptions: for file links, Create()
, as primarily a content operation, will truncate the target; also, on Unix, there is the white lie that a directory symlink itself has a Directory
attribute, which isn't technically true).
As desirable as a mostly-hybrid default representation may be, @KevinCathcart has persuasively argued in https://github.com/dotnet/runtime/issues/52908#issuecomment-850695302 that this would be a serious backward-compatibility concern, so an opt-in via a FileSystemInfo
constructor parameter to either (a) get a hybrid representation or (b) the link's target are needed, as discussed in #52908, which is where we should continue.
As for how the WinAPI functions handle symlinks - see Symbolic Link Effects on File Systems Functions.
public bool IsLink { get; }
I don't understand why this is left out.
It is clear as this could be for symbolic links but we need to take into account Windows reparse point tags and in the case it is not yet clear how to present such API.
As I pointed in https://github.com/dotnet/runtime/issues/52666#issuecomment-852328044
I think we should think about behavior. The behavior can be:
- regular file system
- like symbolic link
We even renamed PowerShell method to IsReparsePointLikeSymlink to explicitly indicate this fact.
This force we think that if we want any generalization of RPs like IsLink() method we have to add a mask property (LinkReparsePoints?) where collect all RPs whose behavior we want consider like symbolic link. Then IsLink() would use the property to work as expected and users could add new RPs to the property as new RPs are introduced.
Edit by @carlossanlop: Revisited API Proposal
Original proposal:
Rationale
The ability to interact with symbolic links (symlinks) in .NET is currently limited to determining that a file is
ReparsePoint
. This proposed API provides the ability to identify, read, and create symbolic links.Proposed API
Details
The path returned from
GetSymbolicLinkTargetPath(string)
/SymbolicLinkTargetPath
will be returned exactly as it is stored in the symbolic link. It may reference a non-existent file or directory.For the purposes of this API, NTFS Junction Points are considered to be like Linux bind mounts and are not considered to be symbolic links.
Updates
GetSymbolicLinkTargetPath
andIsSymbolicLink
fromPath
toDirectory
,DirectoryInfo
,File
andFileInfo
.CreateSymbolicLink
.path
tolinkPath
where a link file's path is desired.FileSystemInfo
base class.