Allow for more granular route matching in Spin

karthik2804 commented 1 year ago

Currently, the route matching in spin can be of two forms

# match a single route
route = "/{path}"

# match any route with the prefix
route = "/{path-prefix}/..."

It would be helpful in certain cases if the route matching allowed more granularity. For an example consider the case of a Bartholomew app with a proxy component.

[[component]]
id = "bartholomew"
[component.trigger]
route = "/..."

[[component]]
id = "proxy"
[component.trigger]
route = "/spin/*"

With such a configuration, the proxy component would match the route /spin/index but not /spin/v1/index. Thus allowing the proxy to handle one specific subset of the routes while bartholomew handles the rest.

There is potential for another level of granularity where we could define something like

route = "/spin/*/test"

Which would only match /spin/index/test but not /spin/index/something-else

Thoughts?

lann commented 1 year ago

I'm not necessarily opposed to this but just want to note that V2 manifests help with a workaround in some (not all!) situations:

[[trigger.http]]
route = "/..."
component = "bartholomew"

[[trigger.http]]
route = "/spin/..."
component = "proxy"

[[trigger.http]]
route = "/spin/v1/..."
component = "bartholomew"

karthik2804 commented 1 year ago

@lann that works in that specific case but if someone adds a /spin/v2 as well, then we would need to add another definition in the manifest. So I would imagine it would not scale very well.

karthik2804 commented 1 year ago

@lann one other practical issue is that it increases the memory usage since the extra component mounts end up creating their own copy of the static files

lann commented 1 year ago

With v2 manifests that shouldn't happen if you point two triggers at one component. If mounts are duplicated in that case it's a bug.

karthik2804 commented 1 year ago

Oh ok - I was testing with just v1 and spin v1.5.1. I was creating multiple components, so that makes sense.

vdice commented 1 year ago

@karthik2804 did Spin v2 and the v2 manifest help here? I'm assuming we'd still like this issue to track nested wildcard route support (eg route = "/spin/*/test" mentioned above)?

itowlson commented 11 months ago

Do we have a good sense of what we want from this feature? It seems like the ask is specifically for a kind of "single level" wildcard, i.e. like /... except only up to the next / (if there is one). Is that the full extent of it?

Some challenges I see here are:

Do we need to surface the "matched" segment(s) in the same way we surface /... tails via spin-path-info?
How does the "longest match" rule translate e.g. if I have /a/b/*/d and /a/*/c/d then which one should handle /a/b/c/d?

Any other design considerations?

karthik2804 commented 11 months ago

The use case I have had requires only a single level of wildcard but based on @lann's workaround above, it could be done by adding multiple triggers.

I would expect the spin-path-info to be exposed as it is today. I am not sure I follow the question about "matched" segments.
I do not have a solid answer for the longest match rule translation.

itowlson commented 11 months ago

@karthik2804 Sorry I maybe confused the question by giving the spin-path-info analogy - it was intended only as analogy. Let me rephrase:

Suppose I have a trigger /a/*/b. Suppose a request matches that trigger, and my component runs. My component wants to know what the * was. (For example, if the user visited /a/gorilla/b, I want the gorilla part.) How does it get it?

(Potentially nastier is the multiple case of /user/*/note/*. Now from /user/itowlson/note/4067 my component wants to know that the wildcards were itowlson and 4067.)

itowlson commented 11 months ago

@karthik2804 Is your comment about the multiple trigger workaround suggesting that this is no longer important (at least to you) and could be dropped from the 2.1 list? I am not sure if we have any other users asking after this yet.

karthik2804 commented 11 months ago

@itowlson, it was in reference to the following

It seems like the ask is specifically for a kind of "single level" wildcard, i.e. like /... except only up to the next / (if there is one). Is that the full extent of it?

The workaround above solves the issue I ran into (I have yet to test).

On the sharing multiple matched segments, ~~could add a new header spin-matched-segments: "itowlson,4067" where the segments are separated by commas.~~ URL paths can contains commas.

itowlson commented 11 months ago

I chatted with Karthik offline and some alternative ideas for capturing * values could be:

A CSV / multi-instance header, with commas in values encoded if present (which should be rare).
Allow/require names for wildcards e.g. /user/*username/note/*noteid giving rise to separate spin-something-username and spin-something-noteid headers.

diversable commented 9 months ago

Allow/require names for wildcards e.g. /user/*username/note/*noteid giving rise to separate spin-something-username and spin-something-noteid headers.

FYI: From a customer perspective, this would be a really useful feature to have - personally, I would vote for named wildcards, but I'd be happy with any workable solution for enabling more powerful wildcard paths 😁

seungjin commented 7 months ago

Regular Expressions are The Swiss Army Knife of Text Processing.

Flexible routing structure can give me more flexible / micro-sized modularization which I can split my app into more smaller wasms. Isn't that the recommended direction for wasm architecturing?

lann commented 7 months ago

Given that https://github.com/fermyon/spin/pull/2305 is nearing completion, one thing I'd like to explore is using that service chaining feature to enable advanced routing scenarios via a routing component, especially for something like regular expressions which may not be appropriate for all users.

itowlson commented 6 months ago

I started exploring this and I am getting very hung up on potential ambiguities. For example, if the app has routes /a/*/c and /a/b/..., which one does /a/b/c match? I started going with a "longest initial exact segment takes precedence" rule, which runs into tiebreaking problems, but those are fixable. But then I started thinking "exact trailing segment should trump wildcard trailing segment".

And then I started thinking "what if we just reject applications with this kind of ambiguous overlap", requiring any pair of routes to be either disjoint or in a subset relationship. Because if you have /a/*/c and /a/b/... then you probably have problems that you need to address before the ambiguous request comes in. And this seems like a reaonable rule if it's easy to implement, but I bet it will turn out to be equivalent to the halting problem because everything always does. And it leaves people who really want to express a / anything-but-b / c stuffed unless we extend the pattern language further (hello regular expressions, hello unbounded evaluation times).

Another possibility, which lines up with some programmatic routers, is to make it order-dependent: first (or last) match wins. Pre-1.0 Spin had that rule, and we replaced it with longest match because it provides more reliability as manifests change (e.g. triggers getting tacked on the end via spin add). We could bring that back for the specific case of "best match is not well defined" but I'm really reluctant to do that because who knows what well-defined means in this case.

Finally, the original issue asked for single-segment match only in a trailing position, e.g. a * suffix to go alongside the ... one. That avoids any ambiguity, but falls short of what others in the thread have been talking about.

I feel like I'm overthinking this because lots of routers just do this stuff, so, save me from myself folks: how should this behave?

fibonacci1729 commented 6 months ago

Could we lean into routefinder here? If we introduce named parameters (e.g. :planet) it seems like we could establish a mapping to the syntax supported by routefinder (e.g. /... => /*). This would appear to cover the usecases above.

Also, I'm not a big fan of supporting ... and *; historically I've seen them both used to mean "wildcard" and dread the confusion to inevitably plague our route dx.

The question then is how do we surface named captures to a guest? Immediately I'm not opposed to propagating captures as spin-route-param headers (or something of the sort) as mentioned earlier.

The major benefit of creating an isomorphism to routerfinder (or similar syntax) is that we can take advantage of the ranking and ordering of routes to handle all the ambiguities mentioned.

lann commented 6 months ago

I like routefinder's precedence rules and am similarly wary of using * since its meaning varies so much (any single segment vs any number of segments vs any number of trailing segments).

I'd be fine with e.g. /:param/... (disallowing * as an input) as the next step here. Given that we already have a bunch of magic spin-* headers I suppose something like spin-param-<param> is probably fine.

itowlson commented 6 months ago

Yeah sorry * was not a syntax proposal it was a way of expressing the ambiguity. The ambiguity, not the syntax, is the problem.

I will look at routefinder precedence: thanks.

fermyon / spin

Allow for more granular route matching in Spin #1923