dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.74k forks source link

[API Proposal]: Path.ReplaceInvalidFileNameChars and Path.ReplaceInvalidPathChars #82606

Open EgorBo opened 1 year ago

EgorBo commented 1 year ago

Background and motivation

It seems to be a popular operation to quickly erase all invalid symbols from a path/filename. For that, we expose the following APIs:

public static char[] Path.InvalidPathChars; // deprecated
public static char[] Path.GetInvalidFileNameChars()
public static char[] Path.GetInvalidPathChars()

and expect users to write loop-like code to handle them by hands. What is even more unfortunate that these APIs are allocating on each call: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.Windows.cs#L15-L31

How OSS projects use these to clean up paths:


  1. dotnet/maui:
    
    static string CleanPath(string path) =>
    string.Join("_", path.Split(Path.GetInvalidFileNameChars()));

2. [win-acme/win-acme](https://github.com/win-acme/win-acme/blob/00a264daefe5f7f67f01b0ba1bd4cc9cb1ae9b3f/src/main.lib/Extensions/StringExtensions.cs#L44-L51)
```csharp
public static string? CleanPath(this string? fileName)
{
    if (fileName == null)
    {
        return null;
    }
    return Path.GetInvalidFileNameChars().Aggregate(fileName, 
        (current, c) => current.Replace(c.ToString(), string.Empty));
}
  1. dotnet/runtime

    foreach (char c in Path.GetInvalidFileNameChars()) validated = validated.Replace(c, '_');
  2. superunitybuild/buildtool

    public static string SanitizeFileName(string fileName)
    {
    string invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
    string invalidRegStr = string.Format(@"([{0}]*\.+$)|([{0}]+)", invalidChars);
    
    return Regex.Replace(fileName, invalidRegStr, "_");
    }

Tons of other places: https://grep.app/search?current=5&q=Path.GetInvalidFileNameChars&words=true&filter[lang][0]=C%23 https://grep.app/search?q=Path.GetInvalidPathChars&words=true&filter[lang][0]=C%23

API Proposal

namespace System.IO;

public static class Path
{
    static string ReplaceInvalidFileNameChars(string path, string replacement);

    static string ReplaceInvalidPathChars(string path, string replacement);

    // Optionally, zero-alloc versions:
    static bool ReplaceInvalidFileNameChars(ReadOnlySpan<char> path, ReadOnlySpan<char> replacement, 
        Span<char> output, out int charsWritten);

    static bool ReplaceInvalidPathChars(ReadOnlySpan<char> path, ReadOnlySpan<char> replacement, 
        Span<char> output, out int charsWritten);
}
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation It seems to be a popular operation to quickly erase all invalid symbols from a path/filename. For that, we expose the following APIs: ```csharp public static char[] Path.InvalidPathChars; // deprecated public static char[] Path.GetInvalidFileNameChars() public static char[] Path.GetInvalidPathChars() ``` and expect users to write loop-like code to handle them by hands. What is even more unfortunate that these APIs are allocating on each call: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.Windows.cs#L15-L31 #### How OSS projects use these to clean up paths:
1. [dotnet/maui](https://github.com/dotnet/maui/blob/33240ea8bccbd398ffb34a4f303dd7301e5f3eaa/src/Essentials/src/FileSystem/FileSystem.uwp.cs#L12): ```csharp static string CleanPath(string path) => string.Join("_", path.Split(Path.GetInvalidFileNameChars())); ``` 2. [win-acme/win-acme](https://github.com/win-acme/win-acme/blob/00a264daefe5f7f67f01b0ba1bd4cc9cb1ae9b3f/src/main.lib/Extensions/StringExtensions.cs#L44-L51) ```csharp public static string? CleanPath(this string? fileName) { if (fileName == null) { return null; } return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty)); } ``` 3. [dotnet/runtime](https://github.com/dotnet/runtime/blob/5415cd1e4e1359dd5fb5c6727bcdf8dcad457a71/src/libraries/System.Configuration.ConfigurationManager/src/System/Configuration/ClientConfigPaths.cs#L379) ```csharp foreach (char c in Path.GetInvalidFileNameChars()) validated = validated.Replace(c, '_'); ``` 4. [superunitybuild/buildtool](https://github.com/superunitybuild/buildtool/blob/4b82f766a584950a46b494f4612556bdd282b48a/Editor/Build/BuildProject.cs#L241-L247) ```csharp public static string SanitizeFileName(string fileName) { string invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars())); string invalidRegStr = string.Format(@"([{0}]*\.+$)|([{0}]+)", invalidChars); return Regex.Replace(fileName, invalidRegStr, "_"); } ``` Tons of other places: https://grep.app/search?current=5&q=Path.GetInvalidFileNameChars&words=true&filter[lang][0]=C%23 https://grep.app/search?q=Path.GetInvalidPathChars&words=true&filter[lang][0]=C%23 ### API Proposal ```csharp namespace System.IO; public static class Path { static string ReplaceInvalidFileNameChars(string path, string replacement); static string ReplaceInvalidPathChars(string path, string replacement); // Optionally, zero-alloc versions: static bool ReplaceInvalidFileNameChars(ReadOnlySpan path, ReadOnlySpan replacement, Span output, out int charsWritten); static bool ReplaceInvalidPathChars(ReadOnlySpan path, ReadOnlySpan replacement, Span output, out int charsWritten); } ```
Author: EgorBo
Assignees: -
Labels: `api-suggestion`, `area-System.IO`, `untriaged`
Milestone: -
stephentoub commented 1 year ago

I see the benefit. At the same time, I'd think we'd want to generalize it a bit more. What would it look like to just support replacing or removing an arbitrary set of characters, possibly specified via IndexOfAnyValues? i.e. why should this be specific to paths?

EgorBo commented 1 year ago

I see the benefit. At the same time, I'd think we'd want to generalize it a bit more. What would it look like to just support replacing or removing an arbitrary set of characters, possibly specified via IndexOfAnyValues? i.e. why should this be specific to paths?

I filed it just because I quite often hit this when I need to store some logs where I use some user's input for file name e.g. {username}.log and just want to make sure it won't crash if user's name contains invalid chars 🙂 Having this in Path increases chances to notice this API when user types Path.InvalidChar... and gets intellisense suggestions

MihaZupan commented 1 year ago

If we had an IndexOfAnyValues ReplaceAny API, would we still consider exposing this one on Path? It still seems like a useful API, even if it's just a place that stores a shared IndexOfAnyValues instance for you.

stephentoub commented 1 year ago

(Maybe we again need to consider a rename for IndexOfAnyValues)

Symbai commented 1 year ago

What would it look like to just support replacing or removing an arbitrary set of characters

41287

jozkee commented 1 year ago

As others mentioned, seems that this could be closed in favor of a more general approach on IndexOfAnyValue. Once https://github.com/dotnet/runtime/issues/82622 resolves, we can decide what to do with this.