dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

Proposed API for symbolic links #24271

Closed carlreinke closed 3 years ago

carlreinke commented 6 years ago

Edit by @carlossanlop: Revisited API Proposal



Original proposal:

Rationale

The ability to interact with symbolic links (symlinks) in .NET is currently limited to determining that a file is ReparsePoint. This proposed API provides the ability to identify, read, and create symbolic links.

Proposed API

public class Directory
{
    public static DirectoryInfo CreateSymbolicLink(string linkPath, string targetPath);
    public static string GetSymbolicLinkTargetPath(string linkPath);
    public static bool IsSymbolicLink(string path);
}

public class File
{
    public static FileInfo CreateSymbolicLink(string linkPath, string targetPath);
    public static string GetSymbolicLinkTargetPath(string linkPath);
    public static bool IsSymbolicLink(string path);
}

public class FileSystemInfo
{
    public bool IsSymbolicLink { get; }
    public string SymbolicLinkTargetPath { get; }

    public void CreateSymbolicLink(string linkPath);
}

Details

The path returned from GetSymbolicLinkTargetPath(string)/SymbolicLinkTargetPath will be returned exactly as it is stored in the symbolic link. It may reference a non-existent file or directory.

For the purposes of this API, NTFS Junction Points are considered to be like Linux bind mounts and are not considered to be symbolic links.

Updates

tmds commented 3 years ago

Requesting information about that link's target should always require opt-in.

We have a different opinion about this.

it seems to me that .GetTarget(bool returnFinalTarget = false) is a better way to address this use case, given that you then get the choice between the immediate and the ultimate target path.

If you don't care about the target path, that just adds noise. Consider how the following would look using GetTarget.

FileInfo file = new FileInfo("/tmp/my-file", followLinks: true);
DateTime lastWriteTime = file.LastWriteTimeUtc;
while (true)
{
    Thread.Sleep(1000);
    file.Refresh();
    DateTime writeTime = file.LastWriteTimeUtc;
    if (writeTime != lastWriteTime)
    {
        Console.WriteLine("The file has changed");
        lastWriteTime = writeTime;
    }
}
bartonjs commented 3 years ago

I think the common case is that they're not interested in anything other than the name, and that it's just a value returned from Directory.EnumerateFiles (or DirectoryInfo.EnumerateFiles, whatever).

When someone does new FileInfo(path) themselves, I don't think there's a good answer to what they might want... if it's the file size, then they probably want the contents. If it's Exists that's a tricky question for symlinks (the link itself exists, but the destination might not, which were you asking about?).

I'm totally cool with a constructor parameter allowing a link (or all links, or N links, whatever) to be followed, so that people who feel strongly about expressions-vs-statements can accomplish their goals, but it just wouldn't be the default. (Plus, I would consider it too weird of a breaking change, particularly while there's still the netstandard2.0 concept floating around)

iSazonov commented 3 years ago

If it's Exists that's a tricky question for symlinks (the link itself exists, but the destination might not, which were you asking about?).

It is already in the Exists. Ex.: https://github.com/dotnet/runtime/blob/b0c7f044dacd417b2fcc2699bd396f1d949baa33/src/libraries/System.IO.FileSystem/src/System/IO/FileSystem.Exists.Unix.cs#L42-L50

mklement0 commented 3 years ago

I'm totally cool with a constructor parameter allowing a link

My concern was duplication of functionality, but I can now see that providing such constructors would be a helpful - and better-performing - alternative to a subsequent .Get*Target() method call.

Judging by the example use case, only providing a bool of returnFinalTarget - i.e. optional resolution to the ultimate target - may be sufficient. Also: What should happen if the target doesn't exist? Exception?

Given the above and @iSazonov's findings, conclusively determining the existence of a link target would require one of the following:

new FileInfo("foo", returnFinalTarget: true).Exists

or (especially with a given FileSystemInfo instance):

new FileInfo("foo").GetTarget(returnFinalTarget: true)?.Exists ?? false

Also note that the current .Exists behavior on symlinks with missing targets may be surprising on Unix: Symlinks aren't typed on Unix, and in the absence of a target even a symlink that originally targeted a directory won't report as a directory. In other words: for a symlink with a missing target on Unix, new DirectoryInfo("broken-symlink").Exists is always false, despite the link's own existence. Conversely, new FileInfo("broken-symlink-meant-to-target-a-dir").Exists is true as well.

jhudsoncedaron commented 3 years ago

new DirectoryInfo("broken-symlink").Exists

That code is buggy even on Windows because it doesn't handle SAN drives correctly.

tmds commented 3 years ago

@bartonjs @mklement0

API proposal for the FileInfo(string fileName, bool followLink) here https://github.com/dotnet/runtime/issues/52908.

mklement0 commented 3 years ago

This is a long post, but please bear with me; I know that the proposal above has already been termed final, but I think the ongoing discussion shows that refinement may be necessary, especially so as to guarantee useful platform-neutral abstractions that leave the door open for future, incremental enhancements; tip of the hat to @iSazonov for his documentation links re NTFS reparse points, which lifted some conceptual fog for me. Apologies if I'm missing previously discussed things.

I will use the term (file-system) link in this discussion to refer to the abstraction of interest: one user-facing file-system entry pointing to another, in a manner directly supported by the file-system (albeit possibly via third-party filters), resulting in transparent redirection when an entry's data is requested (a file's content, a directory's entries).

While (at least initially) limiting the creation of links to symbolic links only, querying probably can and should be supported for all link types, which means:


Let's look at link-related use cases, how the existing proposal and related ones apply to them, and what may be missing:

Use cases:

iSazonov commented 3 years ago

I want to know whether an item with a given path ultimately exists, i.e. in the case of a link I want to know whether the target exists

It seems rarely scenario and we could use a code like IsSymbolicLink() && GetTarget() is not null without complicating the API.

jhudsoncedaron commented 3 years ago

@iSazonov : It's more than half the use cases for FileInfo.Exists.

tmds commented 3 years ago

I want to know whether an item with a given path ultimately exists, i.e. in the case of a link I want to know whether the target exists: The current Exists property doesn't tell you

https://github.com/dotnet/runtime/issues/52908 Exists with tells you this.

and has conceptual problems.

The problem mentioned is that new DirectoryInfo("broken-link").Exists returns false. I don't have a strong opinion about whether that should change or not.

Note that the APIs from this proposal will allow you to check IsSymbolicLink in addition to Exists, which will return true for a broken link.

mklement0 commented 3 years ago

52908 Exists with tells you this.

It would tell you for a FileSystemInfo instance you construct, but not for a preexisting one.

Note that IsSymbolicLink is, as of this writing, no longer part of this proposal - though I do agree that it should be, as IsLink, as argued above.

Yes, checking this member could be used to infer whether a given FileSystemInfo instance is a non-existent path or a broken symlink.

However, just like the new FileSystemInfo("<path>", followLink: true) constructor overload you propose in #52908 is a convenient (and faster) shortcut to new FileSystemInfo("<path>").Get[Link]Target(returnWithFinalTarget: true) (at least in the only way that proposal makes sense to me - see discussion there), new FileSystemInfo("<path>").TargetExists would be a shortcut to telling you whether [a link's] target exists, whereas .Exists would then be free to reflect only the input path's existence, whether it is a link or not - which is conceptually much clearer.

carlossanlop commented 3 years ago

@Jozkee and I discussed the additional comments we received after the main APIs were approved. The new requests/concerns make sense, so we would like to propose these additional APIs to address them:

namespace System.IO
{
    public abstract partial class FileSystemInfo
    {
        public void CreateAsSymbolicLink(string pathToTarget); // Already approved

-       public FileSystemInfo? GetSymbolicLinkTarget(bool returnFinalTarget = false); // Already approved
+       public FileSystemInfo? GetLinkTarget(bool returnFinalTarget = false); // Rename

+       public string GetLinkTargetPath();
    }
    public static class File
    {
        public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved

-       public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+       public static FileSystemInfo? GetLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
    }
    public static class Directory
    {
        public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved

-       public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+       public static FileSystemInfo? GetLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
    }
}

Rename GetSymbolicLinkTarget to GetLinkTarget

This is so that the method can be reused to return all the different types of Windows reparse points (junctions, AppExecLinks, etc.), not just symbolic links.

API to return link path

Returns the path to which the current link points, whether it's relative or absolute. This prevents having to rely on ToString() which is an obscure, unreliable workaround for when the link path is relative (it does not always return the relative path as initially expected).

Proposed as a method, instead of a property, because a property implies having a cached state, while a method implies collecting the latest info each time it's invoked. This is to resolve the concern where multiple p/invokes to a same path would result in retrieving information from a fast-changing filesystem.

Reparse Tag support was evaluated but we decided to not propose it here.

#### A) Enum to define Reparse Tags `FileSystemInfo.Attributes` already has a flag called `ReparsePoint`. - On Linux, when we want to know if a file is a symlink, we check if this flag is set. - On Windows, it _may_ also indicate a file is a symlink, but it's incomplete information, because reparse points on Windows are subdivided with different kinds of identifiers called "Reparse Tags". Windows defines [a list of predefined, reserved reparse tag values](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c8e77b37-3909-4fe6-a4ea-2b9d423b1ee4), and also lets the users define their [own reparse tags](https://docs.microsoft.com/en-us/windows/win32/fileio/reparse-point-tags) (as long as they are above the reserved range). #### B) API to return Reparse Tag Since `FileSystemInfo.Attributes` already has a flag called `ReparsePoint`, it made sense to keep the code compatible with this enum value, while also making sure it makes sense on Unix. Alternatively, we could always return -1 on Unix, since there is no concept of "Reparse Points" or "Reparse Tags" there (but we'd like confirmation from the API Review folks if this makes sense). The three most common types of reparse tags on Windows are symlinks, junctions and appexeclinks, so we would initially define those so Windows users can check this value. On Linux, the user does not need to check the Reparse Tag API, because the `FileSystemInfo.Attributes` `ReparsePoint` tag is enough to know if the file is a symlink or not. But in case the user decides to query the `ReparseTag` API, it would return only two possible values: `None` or `SymbolicLink`. We would have to define a default negative value (-1) to indicate the file is _not_ a reparse point, but a normal file (zero is a value that Windows has [already reserved](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c8e77b37-3909-4fe6-a4ea-2b9d423b1ee4)). Since Reparse Tags have specific predefined hex values (or the user can define their own), we don't necessarily need to define all of them at the beginning, we would just return a value that is not in the enum range. If users need this value to be defined, we can propose them separately.

Optional additional requests

@terrajobst @bartonjs can we please revisit this?

carlossanlop commented 3 years ago

@everyone-interested-in-the-thread, can you please review the API proposal above and let us know if you have additional feedback before we present it?

iSazonov commented 3 years ago

public enum ReparseTag

Will this block using of custom values like IO_REPARSE_TAG_NFS? Maybe replace the enum with public consts?

iSazonov commented 3 years ago

public void CreateAsSymbolicLink(string pathToTarget); // Already approved

If we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget) so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink). Thinking so we could rename public ReparseTag ReparseTag { get; } // Name can be discussed to public LinkType LinkType { get; }

From my previous comment, we could want to check binary flags in LinkType for reparse points, In particular, there is 0x20000000 name surrogate flag. (It is checked in PowerShell IsReparsePointLikeSymlink() method.)

tmds commented 3 years ago

@carlossanlop I provide input with my Linux hat on. You should take from it anything you like. I don't know enough about Windows to say what the overall API should be.

FileSystemInfo performs operations on a path. It doesn't track the underlying file system entry that it represents. If we're making multiple calls against the path, we may not be retrieving information from the same file system entry. I consider this undesirable

The following properties are based on the stat/lstat system call:

    public FileAttributes Attributes { get { throw null; } set { } }
    public DateTime CreationTime { get { throw null; } set { } }
    public DateTime CreationTimeUtc { get { throw null; } set { } }
    public abstract bool Exists { get; }
    public DateTime LastAccessTime { get { throw null; } set { } }
    public DateTime LastAccessTimeUtc { get { throw null; } set { } }
    public DateTime LastWriteTime { get { throw null; } set { } }
    public DateTime LastWriteTimeUtc { get { throw null; } set { } }

The property that is proposed is based on the readlink system call or the realpath function:

    public string LinkTargetPath { get; }

Because these are separate calls, they may have information that is not about the same entry. I want to mention it explicitly, this may something you are ok with.

If I were proposing something (for Linux), it would be:

class FileSystemInfo
{
   public bool IsLink { get; } // always false when ctor `followLinks` is `true`, S_ISLNK when `followsLinks` is `false`

   public string GetTargetPath(bool followLinks = false); // returns followLinks ? realpath(_path) : readlink(_path);
}
class FileInfo
{
   // https://github.com/dotnet/runtime/issues/52908
   public FileInfo(string fileName, bool followLinks) { } // bool switches between stat and lstat
}
class DirectoryInfo
{
   // https://github.com/dotnet/runtime/issues/52908
    public DirectoryInfo(string path, bool followLinks) { } // bool switches between stat and lstat
}

The API that is proposed to create links is:

public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);

I don't know why it is duplicated it over File and Directory. On Linux there is no distinction, so just having it on File is enough.

I'd make this return void. I think in most cases, a user will not care about having this instance. If he wants it, he can just create it.

jozkee commented 3 years ago

Will this block using of custom values like IO_REPARSE_TAG_NFS?

@iSazonov No, you can still expect values that are not listed in the enum.

Maybe replace the enum with public consts?

As per framework design guidelines, we should favor using an enum instead of static constants.

if we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget) so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink).

Both options are feasible, IMO. I prefer adding methods rather than enum values for this particular case. Regarding having LinkType instead of ReparseTag, I think it scales better the latter given that links are a subset of reparse points.

FileSystemInfo performs operations on a path. It doesn't track the underlying file system entry that it represents. If we're making multiple calls against the path, we may not be retrieving information from the same file system entry. I consider this undesirable

@tmds the proposed API should cover for that: On Windows you can use ReparseTag in order to know that you are working with a link. On Unix yuo can use FileAttributes to query if it contains reparse point in order to know that you are working on a symlink. Once identified as link, you can use GetLinkTarget and work with the target.

The property that is proposed is based on the readlink system call or the realpath function

Not sure if this is a question, if so, on Unix, LinkTargetPath will be using readlink or something alike that lets you find out if the link target path is relative or absolute as some folks expressed interest in this.

I don't know why it is duplicated it over File and Directory. On Linux there is no distinction, so just having it on File is enough.

First, there is distinction on Windows; second, as of today, a symlink pointing to a directory is created as a DirectoryInfo by Directory.Enumerate* and you can use the DirectoryInfo ctor for a symlink pointing to a directory.

tmds commented 3 years ago

@tmds the proposed API should cover for that:

What I mean is: LinkTargetPath and ReparseTag can reflect information about a different file system entry than the other properties (like LastWriteTime). That's because retrieving all these values requires multiple calls against the path. Between the calls, the underlying entry can change.

Not sure if this is a question

This is what the implementation would do on Linux.

First, there is distinction on Windows; second, as of today, a symlink pointing to a directory is created as a DirectoryInfo by Directory.Enumerate* and you can use the DirectoryInfo ctor for a symlink pointing to a directory.

ok. Windows requires you to have separate methods on File and Directory.

jozkee commented 3 years ago

What I mean is: LinkTargetPath and ReparseTag can reflect information about a different file system entry than the other properties (like LastWriteTime). That's because retrieving all these values requires multiple calls against the path. Between the calls, the underlying entry can change.

@tmds I see what you mean. Do you know if there is a better way to get this information without falling into that problem?

tmds commented 3 years ago

Do you know if there is a better way to get this information without falling into that problem?

The API in https://github.com/dotnet/runtime/issues/24271#issuecomment-848512625 doesn't have this problem on Linux. Looking at GetFileAttributesEx docs, I think it works for Windows too.

jozkee commented 3 years ago

The API in https://github.com/dotnet/runtime/issues/24271#issuecomment-848512625 doesn't have this problem on Linux.

@tmds how is that? How are you avoiding doing two calls using the file path? i.e. using your API, if you want to get the LastWriteTime of the target, you still need to use two syscalls taking a path, or am I missing something?

var di = new DirectoryInfo("/path-to-link", followLinks: true); // calls readlink(path).
Console.WriteLine(di.LastWriteTime);  // calls lstat(path).
jhudsoncedaron commented 3 years ago

@Jozkee: If new DirectoryInfo("/path-to-link", followLinks: true); called readlink the implementation is buggy. That should call stat(path) (not lstat(path)).

jozkee commented 3 years ago

@jhudsoncedaron ah, great point.

tmds commented 3 years ago

Yes, that is what I mean with the comment: // bool switches between stat and lstat.

How are you avoiding doing two calls using the file path?

class FileSystemInfo
{
   public bool IsLink { get; }

   public string GetTargetPath(bool followLinks = false);
}

IsLink can be determined together with the other properties. The link information is retrieved using a method.

iSazonov commented 3 years ago

Between the calls, the underlying entry can change.

The only way to avoid this is to use a transactional file system. I guess no file API in Runtime eliminates this problem.

carlossanlop commented 3 years ago

@iSazonov I'd like to expand on @Jozkee's answer.

If we would think about creating mount points would it be like public void CreateAsMountPoint(string pathToTarget)? If so maybe it is better to rename to public void CreateAsLink(string pathToTarget) so that in future we could add public void CreateAsLink(string pathToTarget, LinkType linkType = LinkType.SymbolicLink).

I would prefer to keep the APIs separate for every different kind of reparse point. We need to keep in mind that having a reparse point means there's additional metadata we need to add to the file, and there can be cases where the user needs to manually provide this extra information. We can keep adding APIs as needed.

The symbolic links feature is important enough to deserve its own separate creation APIs.

Thinking so we could rename public ReparseTag ReparseTag { get; } // Name can be discussed to public LinkType LinkType { get; }

We thought about this, but Reparse Points are a concept unique to Windows. On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint). On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points.

mklement0 commented 3 years ago

First things first: Thanks for continuing the discussion.

On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint)

However, that is undoubtedly awkward, both in its verbosity and obscurity, given that the term reparse point is Windows-specific.

On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points.

That is awkward too, for the reason that determining whether a path is some kind of link is then quite cumbersome, and would require manual inspection of .ReparseTag for the named-surrogate bit.

The APIs we're discussing are focused on links, and, given that links on Windows are just a subset of reparse points, is it worth exposing details about / allow creation of non-link reparse points at all, or is the abstract attribute flag enough? My impression is that non-link reparse points can be left to specialty APIs.

If .ReparseTag were renamed to .LinkType and only ever reflected named-surrogate reparse points, i.e. links, then at least one could use .LinkType != LinkType.None as a platform-neutral, abstract is-this-some-kind-of-link test (the enum values could still reflect the Windows reparse-tag values, only limited to named-surrogate values).

tmds commented 3 years ago

The only way to avoid this is to use a transactional file system. I guess no file API in Runtime eliminates this problem.

The FileSystemInfo classes don't have this problem now. And if we don't add properties that require additional system calls, we're not introducing it.

As I said, you may not care about it.

This is what a full API might look:

enum LinkType
{
   SymbolicLink,
   ...
}
class LinkInfo
{
    public string Path { get; }
    public LinkType Type { get; }
}
class FileSystemInfo
{
   public bool IsLink { get; }
   public LinkInfo? ReadLink();
}
public static class File
{
    public static void CreateSymbolicLink(string path, string pathToTarget);
}
public static class Directory
{
    public static void CreateSymbolicLink(string path, string pathToTarget);
}
public static class Path
{
   public static string ResolveLinks(string path);
}
carlossanlop commented 3 years ago

@mklement0 :

On Unix, the user will be able to check if a file is a symbolic link, the same way they have been doing until now: by querying fileInfo.Attributes.HasFlag(FileAttributes.ReparsePoint) However, that is undoubtedly awkward, both in its verbosity and obscurity, given that the term reparse point is Windows-specific.

This is how we already do it, so it's not really obscure, even for Unix users. In fact, we explicitly documented FileAttributes.ReparsePoint as cross platform to keep parity with Windows.

While I can agree it may be a bit awkward (regardless of the fact we have FileAttributes.ReparsePoint documented), we didn't want to propose the IsLink or LinkType properties, because we wanted to keep this issue exclusive to symbolic links. If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.

On Windows, the user will have to check for this flag and also check the ReparseTag for the additional information that divides reparse points. That is awkward too, for the reason that determining whether a path is some kind of link is then quite cumbersome, and would require manual inspection of .ReparseTag for the named-surrogate bit.

I disagree because reparse tags are exclusive to Windows. Windows users will have to manually inspect them. In fact, there is a separate request to explicitly expose reparse tags: https://github.com/dotnet/runtime/issues/1908 . If we add LinkType, which is a subset of reparse tags, it would be redundant to try to expose reparse tags later.

The APIs we're discussing are focused on links, and, given that links on Windows are just a subset of reparse points, is it worth exposing details about / allow creation of non-link reparse points at all, or is the abstract attribute flag enough? My impression is that non-link reparse points can be left to specialty APIs.

I agree we should separate the discussion for non-symlink reparse points.

@tmds :

In my opinion (and we also discussed this in the last proposal stream), we cannot have a new class LinkInfo because:

carlossanlop commented 3 years ago

@tmds :

The property that is proposed is based on the readlink system call or the realpath function:

public string LinkTargetPath { get; }

Because these are separate calls, they may have information that is not about the same entry. I want to mention it explicitly, this may something you are ok with. public string GetTargetPath(bool followLinks = false); // returns followLinks ? realpath(_path) : readlink(_path);

Your concern about the potential change of the underlying filesystem makes sense, so we would like to take your suggestion to make this a method instead of a property, but without the boolean parameter: it wouldn't make sense to retrieve the path of the final symlink, especially if it's relative.

Take this case, for example:

/dir
    mylink2
    mylink1
    subdir/
        mylink
        subsubdir/
                file.txt

Where the symlink chain looks like: mylink2 -> mylink1 -> mylink -> file.txt

If all the symlinks were relative:

mylink2 = mylink1 mylink1 = subdir/mylink mylink = subsubdir/file.txt

@tmds what should GetLinkTargetPath(true) return?

mklement0 commented 3 years ago

This is how we already do it, so it's not really obscure, even for Unix users.

Understood, but on a philosophical note: A well-documented obscurity will still be an obscurity to new users. Documentation is no substitute for descriptive, platform-neutral terms. And, as stated, on Windows checking for ReparsePoint isn't sufficient, because not all reparse points are links. So there is currently no platform-neutral abstraction,.

we didn't want to propose the IsLink or LinkType properties, because we wanted to keep this issue exclusive to symbolic links. If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.

My (previously expressed) hope was that while 6.0 may be limited to symbolic links in terms of creation, it will already support all link types in terms of querying (obtaining their target).

Understood re wanting to expose even non-link reparse-tag values (though you could argue that such a property only belongs in the FileSystemEntry type).

In that case, an abstract IsLink property sounds like the best solution, which on Windows can based on whether an entry is a reparse point that also has the name-surrogate flag set.

Also note that the abstract term Link is already in your latest proposal as part of method names such as GetLinkTarget.

tmds commented 3 years ago

we cannot have a new class LinkInfo because:

I think you assume based on the name LinkInfo is something that derives from FileSystemInfo which is not the case.

but without the boolean parameter:

I had removed it here also: https://github.com/dotnet/runtime/issues/24271#issuecomment-849065336.

what should GetLinkTargetPath(true) return?

and replaced the true version with another method (Path.ResolveLinks) which would return /dir/subdir/subsubdir/file.txt.

jozkee commented 3 years ago

In that case, an abstract IsLink property sounds like the best solution

@mklement0 could info.GetLinkTarget()?.Exists ?? false suffice in order to know whether a given file system entry is a link? If so I think we can drop ReparseTag from this proposal and also avoid adding IsLink . This will be complemented by allowing that all link types, (symlinks, junctions and appExecLinks on Windows && symlinks on Unix) get the ability to query its target.

tmds commented 3 years ago

I think we can drop ReparseTag from this proposal and also avoid adding IsLink .

I think IsLink adds a certain visibility which is desired.

If we only distinguish symbolic links in 6.0, and we add the other types of links (reparse points) in the next release, it would be considered a breaking change.

It may not be breaking: the same attribute is used for both:

FILE_ATTRIBUTE_REPARSE_POINT1024 (0x400) | A file or directory that has an associated reparse point, or a file that is a symbolic link.
iSazonov commented 3 years ago

I would prefer to keep the APIs separate for every different kind of reparse point. We need to keep in mind that having a reparse point means there's additional metadata we need to add to the file, and there can be cases where the user needs to manually provide this extra information. We can keep adding APIs as needed.

Yes, for common cases we need the extra information but we say here about generic behavior. Mainly I mean enumeration and removal. (For creation we have to use extra information and do p/invokes like PowerShell does for junctions/mount points - WinCreateJunction() method in PowerShell) I don't know whether .Net API does the enumeration/removal as "correctly" as PowerShell for OneDrive, junctions/mount points and AppX links. In the case there should be internal API to check a kind of reparse points (see IsReparsePointLikeSymlink() method in PowerShell). And the API could become public. But if we get this working in .Net the same as in PowerShell then we have no need in the new public one. Also we need a target for IO_REPARSE_TAG_APPEXECLINK and IO_REPARSE_TAG_MOUNT_POINT. You could look WinInternalGetTarget() method in PowerShell. It gets targets for:

Can we support this in proposed GetLinkTarget()? (Is there other reparse points with behavior like symlink so that we add its in the list too?).


Summary of PowerShell:

  1. Create operation. No doubts separate API should be for every kind of reparse point. In the proposal only CreateAsSymbolicLink() should be. Makes no sense to add creation methods for other reparse points since there are no analogs on Unix. PowerShell will continue to use p/invokes as now.

  2. Remove. Currently PowerShell uses IsReparsePointLikeSymlink() to explicitly disable recursion and treat a file system object as an end file if it is symlink (mount point/junction or appx). I am not sure we can avoid this explicit check since there a generic code works. (Although if we already do FileSystem provider specific checking in the generic code why don't we call the FileSystem provider specific delete method immediately ignoring the generic code?) I think it could be implemented in .Net - yes? - and this excludes needs in public exposing reparse tags.

  3. Enumerate PowerShell implements generic "manual" enumeration. It writes an error for every entity it can not access and continue on error. For FileSystem provider PowerShell don't follow symlinks by default. If user sets FollowSymbolicLink switch PowerShell will follow to symlinks in enumeration and track cycles. I am trying to migrate to new .Net enumeration API and there is still a check with IsReparsePointLikeSymlink() for performance reasons. Again, I'm not sure we can still get rid of IsReparsePointLikeSymlink() there since we "manually" manage recursion and we need to know if FileSystemEntry is like symbolic link (jucntion, appx). Again, I think it could be implemented in .Net - yes? - and this excludes needs in public exposing reparse tags.

bartonjs commented 3 years ago

Video

namespace System.IO
{
    public abstract partial class FileSystemInfo
    {
        public void CreateAsSymbolicLink(string pathToTarget); // Already approved

-       public FileSystemInfo? GetSymbolicLinkTarget(bool returnFinalTarget = false); // Already approved
+       public FileSystemInfo? ResolveLinkTarget(bool returnFinalTarget = false); // Rename

+       public string? LinkTarget { get; }
    }
    public static class File
    {
        public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved

-       public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+       public static FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
    }
    public static class Directory
    {
        public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget); // Already approved

-       public static FileSystemInfo? GetSymbolicLinkTarget(string linkPath, bool returnFinalTarget = false); // Already approved
+       public static FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false); // Rename
    }
}
mklement0 commented 3 years ago

@Jozkee

could info.GetLinkTarget()?.Exists ?? false suffice

info.GetLinkTarget() is not null (or, now: info.ResolveLinkTarget() is not null) should suffice as an abstract test for whether a given item is a link, but I'm with @tmds that info.IsLink is (much) preferable, on the grounds of semantic directness: it is a direct expression of a common use case; plus, it can be implemented more efficiently.

However, if we take a step back (again, unfortunately):

Revisiting @tmds's suggestion in #52908 - hybrid content of FileSystemInfo instances combining name and path of the link with all remaining properties taken from the target (though I would let .Delete() only ever act on the link) - I think it may be the way to go by default, not even with an opt-in (which @tmds had also mentioned as desirable):

The motivation is my discovery of an existing, ugly asymmetry: something like new FileInfo("link").LastWriteTime reports the link's (usually irrelevant) last-write timestamp, whereas new FileInfo("link").LastWriteTime = DateTime.Now sets the (ultimate) target's property.

Given that a link's timestamps, Unix mode, ... itself is usually irrelevant, defaulting to both reporting and manipulating the target's properties is much more sensible. (We'd be giving up a 1:1 correspondence between file-system entries and FileSystemInfo instances, but as the example above shows, this correspondence is already being violated, in obscure fashion - switching to a helpful virtual representation seems much preferable - and #38824 even provides an indication that this may always have been the intent)

Therefore, I propose the following:

The above should eliminate even having to care whether a given path is a link or not in the majority of cases.

LinkTarget and .ResolveLinkTarget() methods could then be moved to the FileSystemEntry type (see below).

As for explicit link handling with respect to querying:

public static string GetFullPath(string path, bool resolveToFinalTarget = false);
iSazonov commented 3 years ago

Make .CreateAsSymbolicLink() return a FileSystemEntry instance, not FileSystemInfo (see below).

@mklement0 It is ref struct :-(

tmds commented 3 years ago

Why do CreateAsSymbolicLink methods return something? I don't expect it to be common to perform more operations once the link is created.

iSazonov commented 3 years ago

Although the hybrid/composite approach may look attractive, it seem is a breaking change and it would be a completely unacceptable. From a practical point of view it is better to have an API that does exactly what is expected and I wouldn't want to confuse the behavior even more. If we want to process exactly links, it is better to have a separate API for that - IO.SymbolicLink(Info) class in addition to IO.File(Info/IO.Directory(Info) so that expose FileSystemEntry for the link.

KevinCathcart commented 3 years ago

The motivation is my discovery of an existing, ugly asymmetry: ...

So on .NET 5, reading the times from a link give you the times on the link itself, but when you set them, they send up setting the targets? Yikes!

It would be ideal to be (effectively) all hybrid behavior or none at all. (By "effectively" I mean: with the caveats like Delete affecting the link itself, which would be the expected behavior in a hardlink scenario, ignoring that most platforms disallow directory hardlinks). But I don't think we can go in the direction of all hybrid by default, given that that is a pretty significant breaking change. An opt-in option certainly would be possible.

Going in the direction of non-hybrid by default is also a breaking change (from the timestamp setters). I would not be very concerned about fixing the timestamp setters. It is hard justify the current behavior as anything but a bug. While there may be a few users that would be adversely affected by fixing it, I would imagine just as many if not more users would effectively getting a minor bug fixed. (There seems to be a PR in progress for this: #52639).

The whole situation here is complicated, since in many simple cases you really do only care about the the target of the symlink, and a fair few APIs and syscalls do a decent job of allowing you to ignore that something is a symlink. But in other cases it is critical that you know a path is a symlink. There have been security bugs in applications on multiple occasions resulting from confusing a symlink with a normal file or normal directory. This makes me want to lean in the direction of not trying to do additional clever things with symlinks by default beyond the normal "by default opening a symlink opens the target" (in both the file contents, and directory traversal sense of "opening").

mklement0 commented 3 years ago

@tmds:

If needing to modify the properties of a newly created symlink would be common, returning an object representing it would be justified. However, Since symlink properties - other than the already-specified target path - are usually irrelevant, I agree that there's no good reason to return such an object.

@KevinCathcart, @iSazonov:

It would be ideal to be (effectively) all hybrid behavior or none at all. But I don't think we can go in the direction of all hybrid by default

Going all-hybrid by default:

The only unfortunate aspect of a hybrid representation is that in order to avoid a truly serious breaking change, the .Attributes value will have to pretend that ReparsePoint is an attribute of the target, so as not to break existing code that uses this attribute toinfer that a FileSystemInfo instance is a link (unreliably, on Windows, given that not all reparse points are links). (A kingdom for a separate .IsLink property! If a ReparseTag property is introduced, it would be free to reflect only the target's value, and could serve to disambiguate.)

But in other cases it is critical that you know a path is a symlink.

And for that an .IsLink property is sufficient - plus the ability to obtain a non-hybrid representation of a link itself, if really needed - see below.


@iSazonov:

It is ref struct :-(

Excellent point, I'm sorry I missed that; what drew me to FileSystemEntry was its type-abstract nature, as described in the help topic (emphasis added):

Provides a lower level view of FileSystemInfo [...]

Since FileSystemEntry being a ref struct rules out its use for on-demand construction, let me propose the following:

iSazonov commented 3 years ago
  • Derive a new class, LinkInfo (or FileSystemLinkInfo) from FileSystemInfo

As MSFT team members said it is better to have separate types because there are a lot of reparse points on Windows (and new ones may appear on Unix) and I agreed with this. So we could introduce SymbolicLink(Info). In this case, this class should not have any hybrid behavior and work with the entity itself - change name, target, time, attributes. etc. Then some magical behavior of FileInfo/DirectoryInfo/FileSystemInfo would be more understandable (I still think we need IsSymbolicLink() there). For example, FileSystemInfo.GetTarget() could return SymbolicLinkInfo.

carlossanlop commented 3 years ago

Now that the APIs got approved, the discussion is now about additional/incremental improvements for which we already have issues open. So if you don't mind, and for clarity, let's continue the conversation in those issues:

tmds commented 3 years ago

public FileSystemInfo? ResolveLinkTarget(bool returnFinalTarget = false); // Rename

Since you've added string? LinkTarget, I think the appropriate default for returnFinalTarget is true. The argument may even be dropped imo.

public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);

As mentioned earlier, the user won't care about this return value. Consider making this return void.

string? LinkTarget { get; }

I still wish this were a method. Not only for consistency with the other properties, but also because now Refresh will probably make an additional syscall for a property the user may not care about.

public bool IsLink { get; }

I don't understand why this is left out.

iSazonov commented 3 years ago

public static FileSystemInfo CreateSymbolicLink(string path, string pathToTarget);

As mentioned earlier, the user won't care about this return value. Consider making this return void.

It is the same as existing static API like Directory.CreateDirectory(). It makes no sense to use another pattern.

public bool IsLink { get; }

I don't understand why this is left out.

It is clear as this could be for symbolic links but we need to take into account Windows reparse point tags and in the case it is not yet clear how to present such API.

mklement0 commented 3 years ago

Regarding an IsLink property (which I also think is called for) in relation to the proposed ReparseTag property: Let's continue the discussion at https://github.com/dotnet/runtime/issues/1908#issuecomment-852117517

For the sake of completeness: another link-related proposal, in the context of file-system-item enumeration (opt-in/out of recursing into directory links), is #52666

jhudsoncedaron commented 3 years ago

So on .NET 5, reading the times from a link give you the times on the link itself, but when you set them, they send up setting the targets? Yikes!

I agree, yikes. The native APIs have the same gotchas.

mklement0 commented 3 years ago

Yikes for sure, but, as @KevinCathcart has also noted, pending PRs #52639 and #52639 (both by @hamarb123) are planning to fix that.

In other words: This makes FileSystemInfo instances true representations of the underlying link file-system entries, as they mostly already are (aside from the asymmetry that the PRs will fix, there are two - defensible - exceptions: for file links, Create(), as primarily a content operation, will truncate the target; also, on Unix, there is the white lie that a directory symlink itself has a Directory attribute, which isn't technically true).

As desirable as a mostly-hybrid default representation may be, @KevinCathcart has persuasively argued in https://github.com/dotnet/runtime/issues/52908#issuecomment-850695302 that this would be a serious backward-compatibility concern, so an opt-in via a FileSystemInfo constructor parameter to either (a) get a hybrid representation or (b) the link's target are needed, as discussed in #52908, which is where we should continue.

As for how the WinAPI functions handle symlinks - see Symbolic Link Effects on File Systems Functions.

iSazonov commented 3 years ago

public bool IsLink { get; }

I don't understand why this is left out.

It is clear as this could be for symbolic links but we need to take into account Windows reparse point tags and in the case it is not yet clear how to present such API.

As I pointed in https://github.com/dotnet/runtime/issues/52666#issuecomment-852328044

I think we should think about behavior. The behavior can be:

  • regular file system
  • like symbolic link

We even renamed PowerShell method to IsReparsePointLikeSymlink to explicitly indicate this fact.

This force we think that if we want any generalization of RPs like IsLink() method we have to add a mask property (LinkReparsePoints?) where collect all RPs whose behavior we want consider like symbolic link. Then IsLink() would use the property to work as expected and users could add new RPs to the property as new RPs are introduced.