Performance Ideas for Attribute Routing

rynowak commented 7 years ago

We should improve the performance of attribute routing. We did some work in 1.0.0 to implement a scalable algorithm, but we didn't really do any optimization work on the code.

For perspective, the routing part of MVC (routing + action selection) can be up to 30-40% of time spent in various techempower benchmarks. This is an area where we can probably improve much more as we haven't done much beside implement a nifty algorithm.

The algorithm we use today is three stage.

1 First, we use a trie to match the literal text portions of the template with the URL path's structure. Each node matches a single segment of the URL path. We only process literal text, so api/products/{id}, we'd match api and products, and for the {id} parameter node, we'd verify that there's some non-empty text there. We walk the trie and collect all valid matches in order.

Note that we allocate in this stage to produce a list of matches. We do this up front to avoid mixing the compute heavy part of the code with async calls (see step 3). Note that this requires us to walk the entire tree, or at least down all the valid paths.

2 Next, we process the matches in order, this does a 'full' match of the path and will do things like populate route values. We can fail at this point due to a constraint, or a complex segment (which we treat like a parameter in step 1).

3 Lastly we do an async call to get a route handler. Since we need to call the next IRouter this is async, though it's generally not. We do this step in a loop with step 2, and short circuit on the first success.

Here's one take: https://gist.github.com/rynowak/a41dc558230e610d71584e8673a8849c

Benchmark.net says this is about 2x faster for the cases I tested. I haven't done any real profiling work on this code yet, I just changed the data structures. I also didn't implement support for catch-all, so caveat emptor and such.

There are two high-level difference between the current implementation and my prototype.

First, mine is dense. Rather than a literal tree of nodes to represent the trie, I'm using an array-based representation. This is intentionally compact, cache and GC friendly. It could potentially be made more compact as well. My main goal here is to avoid pointer chasing. The current node and its siblings are always laid out consecutively in the entry array because we build it using a BFS.

Secondly, my implementation doesn't handle IRouter, it only handles IRouteHandler. IRouteHandler is a more minimal contract that fits how attribute routing is used in practice, but it's not async. MVC's handler only requires IRouteHandler. The reason for this is that if you're not trying to avoid making a lot of async calls, a lot of the complexity just melts away.

Thirdly (yeah it's a bonus) my implementation is doing a linear search on the strings. The current shipping implementation is doing a dictionary lookup and I think that it's non-optimal given the size of N and the relative slowness of OrdinalIgnoreCase hashing. Plus we have to substring 👎. My prototype doesn't include it, but the design anticipates using a jump table based on the first letter or length of the string. Given that a jump table is possible, the jump table could be non-existent (linear search), a simple index based on the first letter into a sorted list (skiplist-like) or a dictionary that tells you exactly what index to jump to, and we could use all of these in one implementation depending on the branchiness of the node.

rynowak commented 7 years ago

/cc @davidfowl @benaadams for weird science

rynowak commented 7 years ago

This test is now checked in here: https://github.com/aspnet/Routing/tree/dev/test/Microsoft.AspNetCore.Routing.Performance (without the new experimental implementation)

SteveSandersonMS commented 7 years ago

Just discussed this with @rynowak. The big question is how we want to deal with async support:

As Ryan notes above, the existing TreeRouter works with IRouter routes, where the process of matching them against an incoming context is (in general) asynchronous. Having to make an async call to each candidate route in series isn't good for perf.
Fortunately, MVC's attribute routing always creates routes that implement IRouteHandler, which means they can be matched against contexts synchronously. So we could make the router do all its matching purely synchronously - it would work.

But then there's back-compatibility. TreeRouter is a public API - we can't really drop its async support. And if it retains any async support, it has to thread async-ness throughout its operation, which significantly limits the perf gains - even if in practice everything gets resolved synchronously.

Therefore if we want a new, more lightweight alternative, it probably has to be a new separate API, with the old TreeRouter left as a vestige for back-compat. Unfortunately that means not only duplicating the code, but also ~150 unit tests (BTW the existing ones rely on async support).

Plan

We think it's probably too late to make such a dramatic change to routing before 2.0.0 goes out. We have had regressions in routing in the past caused by optimisations, and don't want that again. Also, it's likely that in 2.1.0, routing will somehow be moved into the middleware layer, where its API surface would be changing anyway. So that might be a better time for this.

Partial implementation

I just took @rynowak's quick experimental implementation and built a more readable version that's closer to what we might ship: https://github.com/aspnet/Routing/commit/203c65d005fa535c75689fd0f9510fe98d821b9c This still lacks a lot of functionality (like the experimental implementation does, such as handling complex route segments or catch-alls).

Stats

Using Ryan's benchmark,

The existing shipping TreeRouter takes approx 1000ns per iteration
The changes in the partial implementation (https://github.com/aspnet/Routing/commit/203c65d005fa535c75689fd0f9510fe98d821b9c) reduce this to approx 450ns per iteration (i.e., performs as well as Ryan's experimental implementation, because logically it's just about the same)
If I add back async support by threading Tasks through everywhere, it goes up to about 670ns per iteration. It also adds quite a bit to GC pressure because of all the Task instances, which might be optimisable by inlining some of the matching methods into each other so fewer Task instances get created.

rynowak commented 7 years ago

Thanks, moving this to 2.1.0

aspnet-hello commented 6 years ago

This issue was moved to aspnet/Home#2668

aspnet / Routing