dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.31k stars 9.97k forks source link

Resolve routing RegEx constraint dependency size issue #46142

Closed eerhardt closed 1 year ago

eerhardt commented 1 year ago

Routing has a feature named "Route Constraints". One of the options to make a constraint is to add a regular expression to the route, for example: app.MapGet("/posts/{id:regex(^[a-z0-9]+$)}", …). Because these route constraints are inline in the route string, the Regex code needs to always be in the application, in case any of the routes happen to use a regex constraint.

In .NET 7, we added a new feature to Regex: NonBacktracking. This added a considerable amount of code. Depending on the Regex constructor overload used (the one that takes RegexOptions, which ASP.NET Routing uses), this new feature's code will be left in the app, even if the NonBacktracking engine isn't being used.

ASP.NET Routing uses the CultureInvariant and IgnoreCase options when constructing Regex route constraints.

Testing locally, being able to remove the NonBacktracking engine can cut about .8 MB of the 1.0 MB of Regex code out of the app size.

UPDATE 11/30/2022

With the latest NativeAOT compiler changes, here are updated numbers for linux-x64 NativeAOT:

Hello World 3.22 MB (3,381,112 bytes)
new Regex("").IsMatch 3.50 MB (3,680,200 bytes)
new Regex("", RegexOptions).IsMatch 4.33 MB (4,545,960 bytes)

UPDATE 1/18/2023

With the latest NativeAOT compiler changes, here are updated numbers for linux-x64 NativeAOT:

Hello World 2.79 MB (2,925,616 bytes)
new Regex("").IsMatch 3.04 MB (3,193,344 bytes)
new Regex("", RegexOptions).IsMatch 3.82 MB (4,008,288 bytes)

Options

  1. Add new Regex construction APIs that allow for some RegexOptions to be used, but also allows for the NonBacktracking engine to be trimmed. For example: Regex.CreateCompiled(pattern, RegexOptions). This API would throw an exception if RegexOptions.NonBacktracking was passed.
  2. Remove the use of RegexOptions. The IgnoreCase option can be specified as part of the pattern as a Pattern Modifier: (?i). However, CultureInvariant cannot be specified this way.
    • One option is to drop support for CultureInvariant from regex route constraints. This affects the Turkish 'i' handling.
    • Another option could be to add a CultureInvariant Pattern Modifier to .NET Regex, so this could be specified without using RegexOptions.
  3. When using the new "slim" hosting API (See https://github.com/dotnet/aspnetcore/issues/32485), we could remove the Regex route constraints feature by default and have an API that adds it back. Apps using the inline regex route constraints and the slim hosting API, would call this new API to opt-in to the feature.
  4. Add a feature to the linker/NativeAOT compiler that can see which RegexOptions values are used in the app. And for the enum values that aren't used, it can trim the code branches behind those values (i.e. the NonBacktracking code). This would accrue more size savings elsewhere as well.
eerhardt commented 1 year ago

FYI - @agocke @MichalStrehovsky @vitek-karas -

This is another case that https://github.com/dotnet/linker/issues/1868 would "just handle". We never use the NonBacktracking option in our Regex's, but the NonBacktracking code can't be trimmed.

Our plan for .NET 8 is to do something in ASP.NET to get the Regex code trimmed/smaller.

mitchdenny commented 1 year ago
  1. Add new Regex construction APIs that allow for some RegexOptions to be used, but also allows for the NonBacktracking engine to be trimmed. For example: Regex.CreateCompiled(pattern, RegexOptions). This API would throw an exception if RegexOptions.NonBacktracking was passed.

@eerhardt Is the trimmer smart enough to know that if a particular option is specified that it throws an exception and so it doesn't have to evaluate code that might be activated by that code? Seems very smart if it can!

MichalStrehovsky commented 1 year ago

This would not be just about figuring out constants at each callsite, but also being able to constant propagate the bit checks. Things get very complicated very quickly. I don't think this can fit in 8.0.

Also if we do it, there will also be an endless stream of follow up requests for similar constant pattern issues. E.g. just yesterday I had to work around a similar API that accepts a nullable - if the nullable is null, an expensive thing needs to happen to compute it. If it's not null, the call is cheap: https://github.com/dotnet/runtime/pull/80677

We need to really start paying attention to these patterns in API reviews. Optional parameters, flags that do wildly different things... those should ideally be separate methods if we want good trimming.

mitchdenny commented 1 year ago

@MichalStrehovsky - so that is primarily in relation to option #4 above right? if we can make changes to the regex library such that the existing trimming behavior can trim that excess that we don't need in ASP.NET it would be a viable path forward?

MichalStrehovsky commented 1 year ago

@MichalStrehovsky - so that is primarily in relation to option #4 above right? if we can make changes to the regex library such that the existing trimming behavior can trim that excess that we don't need in ASP.NET it would be a viable path forward?

Yep. I think option 4 would be difficult to fit into .NET 8. Trimming is a single pass operation, architecturally. This feature requires knowledge of all callsites before we can do trimming (i.e. it adds an entire new pass).

mitchdenny commented 1 year ago

@eerhardt what do you think about getting this constructor marked public so we can call it:

https://source.dot.net/#System.Text.RegularExpressions/System/Text/RegularExpressions/Regex.cs,b38ad9d9698a3bc5

In fact, the comment in the constructor is telling:

        internal Regex(string pattern, CultureInfo? culture)
        {
            // Validate arguments.
            ValidatePattern(pattern);

            // Parse and store the argument information.
            RegexTree tree = Init(pattern, RegexOptions.None, s_defaultMatchTimeout, ref culture);

            // Create the interpreter factory.
            factory = new RegexInterpreterFactory(tree);

            // NOTE: This overload _does not_ delegate to the one that takes options, in order
            // to avoid unnecessarily rooting the support for RegexOptions.NonBacktracking/Compiler
            // if no options are ever used.
        }

We could change our callsite to flip on ignore case via the Regex itself and pass in CultureInfo.Invariant.

mitchdenny commented 1 year ago

Looping in @stephentoub since he added this comment so presumably he was trying to avoid it for AOT scenarios as well (or at least to enable it to be trimmmed).

stephentoub commented 1 year ago

Instead of:

var r = new Regex("...", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture);

could ASP.NET just do:

Regex r;
CultureInfo old = CultureInfo.CurrentCulture;
CultureInfo.CurrentCulture = CultureInfo.InvariantCulture;
try
{
    r = new Regex("(?i)(?:...)");
}
finally
{
    CultureInfo.CurrentCulture = old;
}

?

mitchdenny commented 1 year ago

OK, so you are relying on this mechanism here to make sure we pick up the invariant culture:

        internal static CultureInfo GetTargetCulture(RegexOptions options) =>
            (options & RegexOptions.CultureInvariant) != 0 ? CultureInfo.InvariantCulture : CultureInfo.CurrentCulture;
stephentoub commented 1 year ago

Yes

danmoseley commented 1 year ago

would we recommend that to customers too? (who wanted smallest possible size) or do we imagine a different approach long term to drop nonbacktracking?

mitchdenny commented 1 year ago

It has a bit of a hacky feel to it as a work around. It is probably fine to get us past this for ASP.NET but I'd love a first class option on the Regex class.

davidfowl commented 1 year ago

I rather not set async locals to set the current culture, that's a bit hacky.

mitchdenny commented 1 year ago

So what are the other options here? Modifying the Regex constructure to include something like this?

public Regex(string pattern, bool ignoreCase, CultureInfo? culture ) { }

... and a bunch of other variants?

javiercn commented 1 year ago

3. We could remove the Regex route constraints feature either with an API that adds it back, or a feature switch set in the .csproj

I do not think removing the feature is an acceptable solution here. It's been there since the times of ASP.NET and is a well-known and used feature, removing it by default for AoT does not seem like the right trade-off.

  1. Add new Regex construction APIs that allow for some RegexOptions to be used, but also allows for the NonBacktracking engine to be trimmed. For example: Regex.CreateCompiled(pattern, RegexOptions). This API would throw an exception if RegexOptions.NonBacktracking was passed.

This seems the more reasonable solution to me, or alternatively a compile-time switch to disable it at the runtime level and trim the associated code.

It might also be interesting to explore the possibility of using the regex source generator to codegen these regexes, although I understand that it might have some drawbacks, like not being able to express regex constraints at runtime if there was no way to discover them at compile time.

ghost commented 1 year ago

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

stephentoub commented 1 year ago

would we recommend that to customers too? (who wanted smallest possible size) or do we imagine a different approach long term to drop nonbacktracking?

They should be using the source generator. So should ASP.NET for the cited scenario, except that currently source generators can't depend on other source generators.

I rather not set async locals to set the current culture, that's a bit hacky.

The fact the CultureInfo uses an async local under the covers is an implementation detail, and this won't affect async locals flow at all, since the culture change is immediately undone, with no async operations performed in the interim. As for it being hacky, it's using the exact mechanism Regex has had for this for over a decade; Regex looks for the current culture, and if you want to change the culture it uses, you need to set the current culture. Setting invariant is so common it also makes it easy via a flag, but the other mechanism still exists, and we can use it here, today, to achieve the stated goals.

So what are the other options here? Modifying the Regex constructure to include something like this?

The other options are the ones Eric enumerates. I don't see us adding the cited ctor "and a bunch of other variants"... as of today there are 10 RegexOptions, and I have no intention of adding 2^10 ctors. If we eventually want to add new APIs related to this, it would likely be CreateInterpreted, CreateCompiled, and CreateNonBacktracking. That, however, is not without its disadvantages/warts, e.g. today NonBacktracking doesn't make use of Compiled, but it could in the future.

NonBacktracking

As an aside, this issue is about being able to trim out NonBacktracking but the cited use is for a regex that is very exposed to untrusted input, and I expect there are a fair number of uses that are susceptible to ReDoS attacks without the dev realizing it, i.e. without them fully understanding the implications of the pattern that will be evaluated against every query string. This could actually be a situation where NonBacktracking is exactly what's desired, for security.

I think option 4 would be difficult to fit into .NET 8

I realize it's challenging. But I opened that issue almost two years ago due to seeing real opportunity for savings, both with the patterns I share as examples, and the variants you cite. We're going to continue to find scenarios it helps, like this one, and until it's addressed, we're going to continue struggling to either find workarounds for those trimming issues or leave size on the table.

I believe the solution today for routing is the setting of CultureInfo, as I outlined. If there's a technical reason why those few lines don't solve the problem today, I'd like to understand better why.

davidfowl commented 1 year ago

They should be using the source generator. So should ASP.NET for the cited scenario, except that currently source generators can't depend on other source generators.

This also only works for constants, so it's a solution but not one for all cases.

The fact the CultureInfo uses an async local under the covers is an implementation detail, and this won't affect async locals flow at all, since the culture change is immediately undone, with no async operations performed in the interim. As for it being hacky, it's using the exact mechanism Regex has had for this for over a decade; Regex looks for the current culture, and if you want to change the culture it uses, you need to set the current culture. Setting invariant is so common it also makes it easy via a flag, but the other mechanism still exists, and we can use it here, today, to achieve the stated goals.

It's hacky and the only reason it works is because it's an async local otherwise it wouldn't just be hacky, it would be buggy (being static mutable state).

I believe the solution today for routing is the setting of CultureInfo, as I outlined. If there's a technical reason why those few lines don't solve the problem today, I'd like to understand better why.

There's a reason why it "feels hacky" to mutate unrelated state to create an instance of a Regex. That said, if this is a short-term workaround for a longer-term solution, then it's fine. We've done similar thing with the ExecutionContext flow for APIs that don't support disabling it.

@mitchdenny create a similar helper for Regex with a giant comment linking to why it works and linking to the linked issues in this issue.

stephentoub commented 1 year ago

This also only works for constants, so it's a solution but not one for all cases.

In my experience and investigations in this area, it's the 95% case. The dynamic cases are typically:

Obviously there are legit cases beyond that, but they're the long tail. They typically use Regex.Escape, and searching around you can see that the number of uses of Regex.Escape pales in comparison to the use of Regex. Not using Escape with dynamic input is a good indicator of a ReDoS waiting to happen if NonBacktracking isn't used.

the only reason it works is because it's an async local otherwise it wouldn't just be hacky, it would be buggy (being static mutable state).

The use in this case has little to do with it being an "async" local. It's to do with it using thread-local state. That is the design of CultureInfo... whether we like it or not, it's based on ambient data. Someone who wants to pass in a culture to lots of scoped code sets the culture, just as ASP.NET currently does for request localization. APIs like Regex that want to participate in this read the current culture. It's not a "hack" to use this mechanism with regex; whether we like it or not, that's how the mechanism was designed and regex uses that mechanism. It's certainly not ideal that using RegexOptions.InvariantCulture roots everything related to options, but using current culture to work around that is no more a hack than adding a new ctor accepting culture to work around that.

We can debate whether 20 years ago the right calls were made as to how CultureInfo works, how culture is passed into tons of APIs like Regex, and even whether Regex should use culture at all. At this point, though, such a mechanism exists to solve the core problem cited in this issue. Arguing for a different design for this ASP.NET issue is no longer in defense of trimming/size.

davidfowl commented 1 year ago

Setting the culture in the middleware whose job is to set the culture is different to setting to cause a side effect in the regex constructor because we don't have a better way of doing it.

Ambient data shouldn't be abused this way and it is absolutely a hack but we'll hack it for now.

stephentoub commented 1 year ago

I'm not going to continue the debate. You have a solution.

davidfowl commented 1 year ago

We’ll open a new issue with an API to accomplish this after we apply the workaround

eerhardt commented 1 year ago

Instead of:

var r = new Regex("...", RegexOptions.IgnoreCase | RegexOptions.InvariantCulture);

could ASP.NET just do:

Regex r;
CultureInfo old = CultureInfo.CurrentCulture;
CultureInfo.CurrentCulture = CultureInfo.InvariantCulture;
try
{
    r = new Regex("(?i)(?:...)");
}
finally
{
    CultureInfo.CurrentCulture = old;
}

?

We can't because of 2 reasons:

  1. We also need to specify a match timeout. There is no API that takes a Timeout, but doesn't take RegexOptions.

https://github.com/dotnet/aspnetcore/blob/e5238763bddd7100823751c4a4ae0220d78160aa/src/Http/Routing/src/Constraints/RegexRouteConstraint.cs#L40-L43

  1. We also should be using Compiled, but aren't today. I opened https://github.com/dotnet/aspnetcore/issues/46154 for this. (Although this could easily be worked around by checking RuntimeFeature.IsDynamicCodeSupported and not passing Compiled when it isn't supported.)
eerhardt commented 1 year ago

I do not think removing the feature is an acceptable solution here. It's been there since the times of ASP.NET and is a well-known and used feature, removing it by default for AoT does not seem like the right trade-off.

@javiercn - I've updated Option (3) above to better explain the current proposal. We would only remove the feature by default when the app uses the new "slim" hosting API. And they can explicitly opt back into it, if they need it.

stephentoub commented 1 year ago

We also need to specify a match timeout. There is no API that takes a Timeout, but doesn't take RegexOptions.

That's legitimate, though also suggests again that NonBacktracking is desirable... the only reason timeout exists is to protect against runaway backtracking.

mitchdenny commented 1 year ago

OK so just caught up on this thread that got plenty of traction overnight ;) It seems like there are two points of view and we need to settle on a path forward:

  1. NonBacktracking is desirable and the ~1MB cost is an acceptable trade off
  2. NonBacktracking isn't needed in our scenario and we would rather save the ~1MB

I think if we can settle where we stand on that then we can talk about the practical solutions. If we decide on position 2 above we will need at least a modest API change so we can set a timeout. There are a few different approaches we could take there - but lets decide on a high level between 1 and 2 first.

stephentoub commented 1 year ago

NonBacktracking is desirable and the ~1MB cost is an acceptable trade off

To be clear, I'm not suggesting ASP.NET must do this; seems like it's been operating with the current design for 9 years successfully. I'm just pointing out that we're eschewing functionality in the name of size that seems on the surface like it's appropriate to this particular use.

It also looks like it'll actually be less than 1MB... currently closer to 800KB. And I'm currently looking to see if there's any low hanging fruit to lower that further (beyond, you know, option 4 :smile:).

eerhardt commented 1 year ago

My thinking is that we take Option (3) from the top post:

When using the new "slim" hosting API (See https://github.com/dotnet/aspnetcore/issues/32485), we could remove the Regex route constraints feature by default and have an API that adds it back. Apps using the inline regex route constraints and the slim hosting API, would call this new API to opt-in to the feature.

That would take all of the Regex code out by default. And apps can opt back in when they need it.

mitchdenny commented 1 year ago

Is that the only place we use Regex (surprised!). Maybe only place after we've taken out all the other stuff for the slim hosting API.

davidfowl commented 1 year ago

That would take all of the Regex code out by default. And apps can opt back in when they need it.

What does opt-in in look like and how does it fail before you opt-in?

mitchdenny commented 1 year ago

I have put up an API design proposal which adds some constructors to Regex here: https://github.com/dotnet/runtime/issues/80835

Whilst that conversation gets started properly over there I wanted to come back to this:

That would take all of the Regex code out by default. And apps can opt back in when they need it.

What does opt-in in look like and how does it fail before you opt-in?

@davidfowl proposed something like the following:

builder.Sevices.AddRouting().AddRegexConstraints()

@halter73 has this reaction to that proposal: 🤮

Personally, I'm not quite sure how that would work. Currently the code we use to translate the route regex pattern to a constraint looks like this:

public RegexRouteConstraint(
    [StringSyntax(StringSyntaxAttribute.Regex, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase)]
    string regexPattern)
{
    ArgumentNullException.ThrowIfNull(regexPattern);

    Constraint = new Regex(
        regexPattern,
        RegexOptions.CultureInvariant | RegexOptions.IgnoreCase,
        RegexMatchTimeout);
}

With my proposal to dotnet/runtime it would become this:

public RegexRouteConstraint(
    [StringSyntax(StringSyntaxAttribute.Regex, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase)]
    string regexPattern)
{
    ArgumentNullException.ThrowIfNull(regexPattern);

    Constraint = new Regex(
        regexPattern,
        RegexMatchTimeout,
        CultureInfo.InvariantCulture
        );
}

If I am reading the way constraints are implemented in general, it all looks like it is Regex based anyway. We would effectively be looking to switch over to writing a parser for route parameters by hand?

davidfowl commented 1 year ago

We have 2 routing systems entangled in the code base but we only use one by default. When you register routing, it pulls in the old and new system together. It might also be time to split up the methods that register the old IRouter based routing and the endpoint-based routing systems.

Once we do that, then the only place route constraints are rooted is here https://github.com/dotnet/aspnetcore/blob/e5238763bddd7100823751c4a4ae0220d78160aa/src/Http/Routing/src/RouteOptions.cs#L92-L127.

If we remove the regex inline constraint, then it won't be on by default and the user will get a runtime error if they use a regex constraint:

using Microsoft.AspNetCore.Routing.Constraints;

var builder = WebApplication.CreateBuilder();
builder.Services.AddRouting(o =>
{
    o.ConstraintMap.Remove("regex");
});

var app = builder.Build();

app.MapGet("/{name:regex(\\w+)}", (string name) => $"Hello {name}");

app.Run();

Here's the error:

System.InvalidOperationException: The constraint reference 'regex' could not be resolved to a type. Register the constraint type with 'Microsoft.AspNetCore.Routing.RouteOptions.ConstraintMap'.
   at Microsoft.AspNetCore.Routing.DefaultParameterPolicyFactory.Create(RoutePatternParameterPart parameter, String inlineText)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.CreateCandidate(Endpoint endpoint, Int32 score)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.CreateCandidates(IReadOnlyList`1 endpoints)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.AddNode(DfaNode node, DfaState[] states, Int32 exitDestination)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.AddNode(DfaNode node, DfaState[] states, Int32 exitDestination)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.AddNode(DfaNode node, DfaState[] states, Int32 exitDestination)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherBuilder.Build()
   at Microsoft.AspNetCore.Routing.Matching.DataSourceDependentMatcher.CreateMatcher(IReadOnlyList`1 endpoints)
   at Microsoft.AspNetCore.Routing.DataSourceDependentCache`1.Initialize()
   at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)
   at Microsoft.AspNetCore.Routing.Matching.DataSourceDependentMatcher..ctor(EndpointDataSource dataSource, Lifetime lifetime, Func`1 matcherBuilderFactory)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcherFactory.CreateMatcher(EndpointDataSource dataSource)
   at Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware.InitializeCoreAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware.<Invoke>g__AwaitMatcher|8_0(EndpointRoutingMiddleware middleware, HttpContext httpContext, Task`1 matcherTask)
   at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddlewareImpl.Invoke(HttpContext context)

So this could look like turning off regex by default with a new AddRouting API and then exposing the generic AddConstraint\<T> method to allow users to add constraints back in a trim safe way. Then we could add AddRegexConstraint() to RouteOptions as a shortcut for AddConstraint<RegexInlineRouteConstraint>(defaults, "regex");.

Opting in would look like this:

var builder = WebApplication.CreateBuilder();
builder.Services.AddRouting(o => o.AddRegexConstaint());

var app = builder.Build();

app.MapGet("/{name:regex(\\w+)}", (string name) => $"Hello {name}");

app.Run();

AddRegexConstaint impl:


using System.Diagnostics.CodeAnalysis;
using Microsoft.AspNetCore.Routing.Constraints;

public static class RegexRouteOptionsExtensions
{
    [DynamicDependency(DynamicallyAccessedMemberTypes.PublicConstructors, typeof(RegexInlineRouteConstraint))]
    public static RouteOptions AddRegexConstaint(this RouteOptions routeOptions)
    {
        routeOptions.ConstraintMap["regex"] = typeof(RegexInlineRouteConstraint);
        return routeOptions;
    }
}
mitchdenny commented 1 year ago

OK. I think I'm following.

The thing I still don't quite understand how we are going to achieve is changing GetDefaultContraintMap() so that it excludes the Regex constraint type in the first place (this is probably my lack of familiarity with the trimming friendly coding patterns at play).

Although I think @eerhardt mentioned in a recent sync that he merged an API that we can use to check whether AOT at play to opt code out?

This does mean that the F5 experience is going to be different to the executive on the AOT published binary though.

davidfowl commented 1 year ago

The thing I still don't quite understand how we are going to achieve is changing GetDefaultContraintMap() so that it excludes the Regex constraint type in the first place (this is probably my lack of familiarity with the trimming friendly coding patterns at play).

That's a good question.

Although I think @eerhardt mentioned in a recent sync that he merged an API that we can use to check whether AOT at play to opt code out?

Yes, it's the one we use everywhere RuntimeFeature.IsDynamicCodeSupported (in this case). Though tbh, the plan to exclude regex is above size not capability so it feels a little bad to use that flag to decide that this feature should be excluded.

eerhardt commented 1 year ago

Yes, it's the one we use everywhere RuntimeFeature.IsDynamicCodeSupported (in this case)

That's not my proposal. My proposal is that when the app uses WebApplication.CreateSlimBuilder(), the Regex route constraints aren't enabled by default. Users would need to explicitly add them in afterwards.

When the app uses the existing WebApplication.CreateBuilder(), the Regex route constraints would still be enabled, as today.

davidfowl commented 1 year ago

@eerhardt yea but how? The default RouteOptions includes Regex. We're going to have to remove it or add a new code path for CreateSlimBuilder to call.

eerhardt commented 1 year ago

I haven't figured out exactly how yet. But I think there are possibilities to explore.

One of the biggest wrinkles is that IOptions only uses the default, parameterless constructor of a class.

One possibility I thought was to split the "default constraints" out of the parameterless constructor of RouteOptions and create methods that can be called during configure like AddDefaultConstraints() and AddSlimConstraints() (or better names, but you get the idea).

davidfowl commented 1 year ago

@eerhardt Right, that's exactly what I said above 😄

mitchdenny commented 1 year ago

Seems like we've been coalescing around the same solution. I was looking at something like this today. I should be able to prototype this today and see how it works.

mitchdenny commented 1 year ago

OK I've thrown up a draft PR to show what this could look like. Tests are failing right now so there will need to be more changes, but it shows the basic idea where the selection of constraints is deferred to the WepApplicationBuilder.

JamesNK commented 1 year ago

Idea: AlphaRouteConstraint inherits from RegexRouteConstraint. We can't change that. But I don't see a reason why we couldn't make an internal SlimAlphaRouteConstraint that doesn't use regular expressions, behaves the same, and is registered with the alpha route constraint prefix.

No one cares what constraint type is used to evaluate a parameter with the alpha constraint, just that it behaves the same.

Then we can improve the name of AddSlimConstraints() to just AddRegexConstraint().

davidfowl commented 1 year ago

@JamesNK I discussed this with @mitchdenny, I'll update this with what we spoke about:

New API to add regex constraints: RouteOptions.AddRegexConstraints() 

CreateSlimBuilder () => AddRoutingCore() (new public API) CreateBuilder() => AddRouting() -> AddRoutingCore()                                  -> RouteOptions.AddRegexConstraints()

The above is the resulting method call chain after these changes.

JamesNK commented 1 year ago

What do you think of the idea of creating a non-regex-based alpha constraint? IMO it's not intuitive to people that an alpha constraint is a regular expression constraint. Also, it means anyone using an alpha constraint never needs to worry about configuring additional routes. Their app just works with the slim builder.

We can go from RouteOptions.AddRegexConstraints() to RouteOptions.AddRegexConstraint().

eerhardt commented 1 year ago

What do you think of the idea of creating a non-regex-based alpha constraint?

I'd need to see numbers proving that it adds value. After https://github.com/dotnet/aspnetcore/pull/44770, the alpha constraint is using the Regex source generator, meaning:

  1. The regex is compiled IL code. There is no interpreting happening in either CoreCLR or NativeAOT.
  2. The regex is pre-compiled. There are no "startup" issues of generating IL at runtime.
  3. The full regex engine can be trimmed away if no one else is using it. Which means the size savings of not using the source generated Regex is very minimal.

To show the size savings, I published 3 apps:

Console.WriteLine("Hello");
Console.WriteLine(Regexes.GetAlphaRouteRegex().IsMatch("Hello"));
partial class Regexes
{
    [GeneratedRegex(@"^[A-Za-z]*$")]
    public static partial Regex GetAlphaRouteRegex();
}
Console.WriteLine(new Regex(@"^[A-Za-z]*$").IsMatch("Hello"));

On Windows x64, the published app sizes were:

Looking at the mstat differences between the first two:

image

And the differences between the last two:

image

In summary, I don't think we will gain much by changing the alpha constraint. We can leave it in by default the way it is today. And just have RouteOptions.AddRegexConstraint() (not plural), or even RouteOptions.AddInlineRegexConstraint().

mitchdenny commented 1 year ago

I'm seeing a bit of a pattern forming around the kinds of refactoring we we'll need to do for scenarios where we aren't doing wholesale replacement with source generators.

In the case of the regex constraints I've seen two scenarios:

WebHost.ConfigureWebDefaults(...) -> WebHost.ConfigureWebHostDefaultsCore(...)

and:

AddRouting(...) -> AddRoutingCore(...)

I wonder if this will be common enough to formalize some kind of method chaining for folks who are using native AOT, but what to opt into features which we've purposely excluded because of size considerations.

For example:

AddRoutingCore(...)
  .WithRegexConstraints()
  .WithOtherThingWeCutOutButCanBeAddedBackIn()

If there aren't going to be many then it would not be worth it, but if ended up being common then having a consistent approach could help folks needing to figure out how to add in the missing X feature.

Actually, it probably makes more sense to do it like this:

var builder = WebApplication.CreateSlimBuilder(...).WithRegexConstraints(...)
davidfowl commented 1 year ago

I don't think we should hoist anything top level just yet. That would lead down a path where every low-level API to add features back ends up being a top-level API on the builder. We do not want that to be the case.

eerhardt commented 1 year ago

@mitchdenny - unfortunately https://github.com/dotnet/aspnetcore/pull/46227 didn't fully address this size issue. RegexRouteConstraint still gets pulled into this app with this path:

image

See this code:

https://github.com/dotnet/aspnetcore/blob/bb3f0d626627d399695a9fa0741e4803c0c23eb8/src/Http/Routing/src/Patterns/RoutePatternFactory.cs#L938-L955

JamesNK commented 1 year ago

Damn. Is this definitely the last reference?

RoutePatternFactory.Parse is a static API with no integration with DI or configuration. I think achieving the trimming goal requires a breaking change.

The RegexRouteConstraint created here could be replaced by a new RegexPlaceholderRouteConstraint. The regex placeholder constraint just captures the pattern as a string. Then, when routing builds the route table, the placeholder route constraint is replaced by looking up regex from the route constraint table and passing the placeholder's string as an argument. If a regex constraint is registered, then it is used. If the erroring constraint is resolved, then the standard error is thrown.

The breaking change is that calling RegexPlaceholderRouteConstraint.Match throws an error. It can't reference Regex so it can't properly evaluate the route. We can provide a good error message that explains how to work around the change.

The vast majority of people don't use a RoutePattern directly. They create it and register it with the app, and allow the app to figure out the right way to use it. Maybe do an API search on GitHub for people getting parameter policies from a RoutePattern?

JamesNK commented 1 year ago

Idea to make the change less breaking!

IRouteConstraint.Match has an optional HttpContext argument. If RegexPlaceholderRouteConstraint.Match is called, we could resolve route options from the context's request services and then look up the route constraint table with regex. The placeholder constraint can then cache the real regex constraint and use it in Match.

This isn't a 100% fix. HttpContext is optional. Matching an incoming HTTP request will always have a HttpContext, but it's possible to do link generation outside of an HTTP request, and there are overloads on the link generator where the context is optional.

davidfowl commented 1 year ago

Or maybe we make the breaking change in AOT mode only?