[Routing] Common subtree node reuse

javiercn commented 3 years ago

In our DFA matcher we produce a DFA in the shape of a tree.

The general premise is that we walk the tree in a breadth first approach, in precedence order (literals, then variables) for matches without backtracking.

For that reason, whenever we encounter a route with a variable, we add all the literals on the same level as parents for any potential subnode from the variable. That way, if we have a set of routes a/b and {var}/c when we receive a request to a/c we match against a and then we match the additional node c we added to make sure we still cover {var}/c without backtracking.

The strategy that we currently follow involves creating a set of additional branches for every literal sibling to the parameter node we are dealing with on the tree. However, in many cases those branches will be "unique" enough that we can reuse them if we can discover them efficiently.

And we can potentially do so while we are constructing the DFA. The reason is that the process by which we "expand" the variable nodes into the tree, involves adding all the sibling literals as parent nodes for the given route. We can record that event and build a table of (endpoint, depth+1) -> (shared node, [parents]).

Then when it comes time to evaluate the next level and we are looking at a literal node in the tree, we can check our table to see if there is an entry that matches that endpoint and depth). There are two cases here:

If we detect that we are going to add a new node to the tree:
- If the entry on the table does not contain a candidate node, we create a node and make it the shared node for that endpoint branch.
- If the entry on the table already contains a candidate node, we add ourselves to the list of "shared parents" and continue.
If we detect that we are going to add to an existing node, we check for a rule in our table that matches the current depth.
- If the node is the same candidate for the given endpoint, then we reuse it.
- If the node is not the same candidate for the given endpoint, then we do as follows:
- "Clone" the node and replace the existing node with the new node.
- Take the parent of the list of shared nodes
- Add a rule for endpoint, depth + 1 since it might still be possible to reuse parts of the subtree at a lower level.

It's also important to note, that after we've processed the next level, when we identify nodes for which only one endpoint is responsible for the branch (we might need to do extra tracking for that) we can avoid processing further levels on the shared parent nodes, since at that point we know the rest of the subtree will be common to them.

The pseudo algorithm above is not complete and needs some tweaks to account for catch-all segments as well as route sets with parameters in multiple routes, but its a general starting point for deduplicating unnecessary nodes.

We can follow this pattern for trees with a large set of routes and parameter prefixes for which the parameters don't have a constraint we can use for trimming the set of nodes.

We can also do this for branches for which the literal actually matches the constraint, since at that point they behave like routes with a parameter.

Finally, we don't need to keep the book of all route parameters for more than a couple of levels. Once we have processed a level of the tree, we know already the list of nodes that will be unique at that level and we can discard the previous level registry.

Even in the cases where there are many routes with parameter prefixes the natural tendency would be for those routes to "disambiguate" on one of the following levels due to the tendency of routes to be unambiguous.

After we've built this, we would need to update the algorithm we use to build the transitions into the graph to account for the reused nodes.

As for a comment on performance, while the additional tracking involves a bit more work and memory, both parameters are "within" the same order of magnitude of the list of endpoints and CPU time and memory will both benefit from processing a much smaller set of nodes.

ghost commented 3 years ago

Thanks for contacting us.

We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

ghost commented 3 years ago

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

dotnet / aspnetcore

[Routing] Common subtree node reuse #33749