dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.22k stars 9.95k forks source link

ArgumentException "unrecognized escape sequence" in RewriteOptions.AddApacheModRewrite in case of usage of regex shorthand character classes (like \d) #18555

Open xzxzxc opened 4 years ago

xzxzxc commented 4 years ago

Description of the bug

The System.ArgumentException occurs while executing the method AddApacheModRewrite if passed file contains regex rules with shorthand character classes (like \d).

To Reproduce

public static class ApplicationExtensions
{
    public static IApplicationBuilder UseUrlRewriter(this IApplicationBuilder builder)
    {
        using (var fileReader = File.OpenText("apache_rewrite_rules.txt"))
        {
            var rewriteOptions = new RewriteOptions()
                .AddApacheModRewrite(fileReader);

            return builder.UseRewriter(rewriteOptions);
        }
    }
}

Where file apache_rewrite_rules.txt contains: RewriteRule ^/(\d)$ /?num=$1

Exception message: System.ArgumentException: 'parsing '^/(\d)$' - Unrecognized escape sequence \\d.' Stack trace:

   at System.Text.RegularExpressions.RegexParser.ScanCharEscape()
   at System.Text.RegularExpressions.RegexParser.Unescape(String input)
   at System.Text.RegularExpressions.Regex.Unescape(String str)
   at Microsoft.AspNetCore.Rewrite.Internal.ApacheModRewrite.Tokenizer.RemoveQuotesAndEscapeCharacters(IList`1 tokens)
   at Microsoft.AspNetCore.Rewrite.Internal.ApacheModRewrite.Tokenizer.Tokenize(String rule)
   at Microsoft.AspNetCore.Rewrite.Internal.ApacheModRewrite.FileParser.Parse(TextReader input)
   at Microsoft.AspNetCore.Rewrite.ApacheModRewriteOptionsExtensions.AddApacheModRewrite(RewriteOptions options, TextReader reader)
   at Middleware.ApplicationExtensions.UseUrlRewriter(IApplicationBuilder builder, Action`1 configure) in D:\workspace\url-rewriter\UrlRewriter.Middleware\ApplicationExtensions.cs:line 17
   at UrlRewriter.Tests.ApacheModRewriteTests.<>c.<.ctor>b__1_0(IApplicationBuilder app) in D:\workspace\url-rewriter\UrlRewriter.Tests\ApacheModRewriteTests.cs:line 22
   at Microsoft.AspNetCore.Hosting.DelegateStartup.Configure(IApplicationBuilder app)
   at Microsoft.AspNetCore.Hosting.Internal.AutoRequestServicesStartupFilter.<>c__DisplayClass0_0.<Configure>b__0(IApplicationBuilder builder)
   at Microsoft.AspNetCore.Hosting.Internal.WebHost.BuildApplication()

Comments

RewriteRule statement implies to use regular expressions, so it's not clear why there is a call of Regex.Unescape in ApacheModRewrite.Tokenizer.RemoveQuotesAndEscapeCharacters, seems like it's a bug considering that Regex.Unescape is unable to convert sequences such as \w, \d or \s, it throws an ArgumentException.

jaamison commented 3 years ago

Happy to take a crack at this if at all helpful. Looks like there are several other little quirks throughout the apache compatibility subsystem that break parity with apache itself that I'd be happy to submit PRs for.

jonagh commented 2 years ago

I just ran into this on a new proj. Simple things like \d can be changed to [0-9] and . to [.] but \w and \s and others which represent a lot of chars are basically show-stoppers for using this feature with regex rules.

Also just realized RewriteMap is not supported :/ Guess I'll convert all the rules to IISUrlRewrite format (hopefully that works fully).

This call should probably be renamed to AddApacheModRewriteWithOnlyLimitedSupport, so people like me know not to use it before we write a bunch of rules expected everything to work correctly.

danmoseley commented 1 year ago

Happy to take a crack at this if at all helpful.

@jaamison it looks like we overlooked your offer, sorry about that. In the unlikely event you are still interested in offering PR's we'd be happy to have them.