elixir-plug / plug

Compose web applications with functions
https://hex.pm/packages/plug
Other
2.88k stars 586 forks source link

handle identifiers with suffix #1007

Closed boonious closed 3 years ago

boonious commented 3 years ago

This PR provides a POC for path identifiers that include suffixes such as format or file extension (.xml, .json).

josevalim commented 3 years ago

I don't think this is precisely what you want. This makes it so the name of the variable we are injecting is called "id.format". It is not matching on format. :)

boonious commented 3 years ago

Yes, realised that for this work, the variable will need to be called id.format which is not possible in Elixir.

However apart from that, everything else works, as shown in the tests in this PR. Whilst the id is not feasible via baz.format (variable), it's available through conn's path_params, i.e. resp(conn, 200, inspect(conn.path_params["baz.format"])) (in tests).

We are serving routes in multi-tenant environment that needs identifiers to be matched according to different format extensions as these can be owned by different tenants. Any pointers and advise how we can implement this would be greatly received. At the moment, we are having to rewrite paths in order to fulfil the requirement.

josevalim commented 3 years ago

Given the name of the variable ends up being "foo.format", I don't think this belong in Plug per se. Are you sure that a /:id.format is also not matching on requests such as /123_json? Based on this current PR, I don't see how /:id.format would match only on /123.json and not /123_json.

james-bowers commented 3 years ago

I think part of the solution we need, involves updating the matcher used in do_match/4, by additionally splitting based on the ., but to also keep the . within the matcher.

Currently when splitting the matcher, we get:

iex(3)> Plug.Router.Utils.split("/foo/:id.json")
["foo", ":id.json"]

But I think for matching based off the extension, potentially we'd need something more like this:

iex(3)> Plug.Router.Utils.split("/foo/:id.json")
["foo", ":id", ".json"]

Although when we looked into this in the past, it didn't end up quite so simple, and trying to remember why! 🤔

This could potentially be a breaking change for some users of this library

boonious commented 3 years ago

I have updated the PR to demonstrate the idea for path splitting based on . (dot). This should provide a simple solution that can facilitate both /foo/:id.json and /foo/:id.:format type identifiers (shown in the tests) that would be great for us.

However this goes against Plug.Static that serves paths containing dots as well, e.g. /public/fixtures/static.txt : (

josevalim commented 3 years ago

Exactly. This is definitely going to be a breaking a change for anything that is marching on the dot and we can’t accept it. I think the solution is still to handle this in your apps, if you really need to.

boonious commented 3 years ago

Thanks for getting back so promptly, shall close this PR soon. It's ok to handle format extension in app, however it's tricky to scale the approach for numerous multi-tenant routing in the simplest form of DSL that's user-friendly to tenants (e.g. below). Any pointer and advice on how to go about doing this would be greatly and curiously received.

match "/sport/:discipline.app", to: # -> service A
match "/sport/:discipline", to:     # -> service B
boonious commented 3 years ago

@james-bowers has a wonderful idea to conditionally split paths on . only for identifiers, i.e. :id.json not id.json. I have implemented this quickly: all tests now passed. This PR no longer breaks Plug.Static. This is just so that we can have proper closure to this POC discussion with a PR that passes all tests ✅.

Thank you.

boonious commented 3 years ago

👋 @josevalim et al. should I close this PR? Any further thoughts on this?

The PR currently splits path on / and . only for identifiers. The split is applicable to all paths but there should be a scope to use match option to selectively "split on dot" at DSL-level so that the performance is maintained for types of non-dot routes. This also provides :id.:format type identifier matching.

josevalim commented 3 years ago

I have dropped a comment. Have you tested this on actual project serving from Cowboy? I have a feeling this won't work and there is no way we can instruct Cowboy to split on dots that is actually backwards compatible.

I do have one idea though, which is to compile something like ":id.json" to a guard like this:

id when binary_part(id, byte_size(id) - 5, byte_size(id)) == ".json"

But I am not even sure if the current routing structures support guards.

boonious commented 3 years ago

I do have one idea though, which is to compile something like ":id.json" to a guard like this:

id when binary_part(id, byte_size(id) - 5, byte_size(id)) == ".json"

But I am not even sure if the current routing structures support guards.

Great! Could potentially explore this idea in various ways:

  1. DSL for path containing :id.json compiled into get/3 or match/3 plug router function with the above guard.
  2. In plug, internally transform path containing :id.json (to :id) and generate the guard for ".json" during router functions compilation. For example, via extract_path_and_guards in the following line? Not sure what’d be the side effects this time though but happy to uncover them.

https://github.com/elixir-plug/plug/blob/64d94367baa29a0477e498ce6301408108890158/lib/plug/router.ex#L564

Shall have a bash at these later and hope to have a workable PR.

boonious commented 3 years ago

The idea works. We tested it in our DSL by compiling paths ending with format extension into the proposed guard, e.g.:

  # route "/sport/:discipline.app"
  defmacro route(path) do
    {last_segment, preceding_segments} = path |> String.split("/") |> List.pop_at(-1)
    [_last_segment, id, format] = Regex.run(~r/^:(.*)\.(.*)$/, last_segment)

    # "/sport/:discipline"
    new_path = (preceding_segments ++ [":" <> id]) |> Enum.join("/")
    id_var = Macro.var(String.to_atom(id), nil)

    quote do
      get
        unquote(new_path)
        when binary_part(
               unquote(id_var),
               byte_size(unquote(id_var)) - byte_size(<<?., unquote(format)>>),
               byte_size(<<?., unquote(format)>>)
             ) == <<?., unquote(format)>>
    end
  end

Going to check next if it's feasible and desirable for the solution to be part of plug.

boonious commented 3 years ago

@josevalim, I have implemented your idea and resubmitted the PR. Hope this makes sense.

josevalim commented 3 years ago

Hi @boonious, this looks like the way to go! Exciting! We need some small modifications only:

  1. In the case of /:foo.json or /:foo-bar, the value of :foo will be the rest of the string without .json or -bar
  2. Please avoid using regexes int he implementation if possible, except for the parsing of the identifier itself. In other words, for each path, you should break it in one, two, or three parts:

    [prefix, identifier, suffix]

Also please make sure we raise if we have two identifiers, such as /:foo-:bar.

Thanks!

josevalim commented 3 years ago

Hi @boonious! The new parse function is a great improvements but I still think we can refactor things a bit.

The idea is that we want to move most of the logic to Router.Utils. One suggestion I have is to create a build_path_head function, which is like build_path_match but it returns a triplet with {vars, match, guards}. This function can use the new parse_suffix_identifier internally. build_path_match will continue to exist but it should raise if the new patterns ":foo-bar" are used.

The benefit of moving this to the utils module is that later on adding this feature to Phoenix will be much easier (or any other framework). Thank you and have a great weekend!

boonious commented 3 years ago

Hi @boonious! The new parse function is a great improvements but I still think we can refactor things a bit.

The idea is that we want to move most of the logic to Router.Utils. One suggestion I have is to create a build_path_head function, which is like build_path_match but it returns a triplet with {vars, match, guards}. This function can use the new parse_suffix_identifier internally. build_path_match will continue to exist but it should raise if the new patterns ":foo-bar" are used.

The benefit of moving this to the utils module is that later on adding this feature to Phoenix will be much easier (or any other framework). Thank you and have a great weekend!

Hello @josevalim many thanks for the review. The utils functions were part of an iteration to address your previous comment re. 1) use of regex, 2) a generalisation concept of suffix and 3) removal of suffixes from conn params (id variables still capturing the whole string foo.json, otherwise, we'd be back to the issue with . splitting breaking things again?). I updated the PR now with the outcome of the iteration - see if this makes sense.

Shall consider the latest comments next. After this iteration and inflating the code in Router some what, I definitely agree we need to encapsulate the logic in __route__ and Router.Utils. Also at the moment, suffix removal is done by rewriting params in do_match, wondering if there's a smarter way to do so before the function.

boonious commented 3 years ago

Addressed the comments surrounding the utils, refactored and moved the suffix guards injection logic to Router.Utils in the latest commits. Next, going to figure out how to build the right matcher and provide identifier values (minus suffix) properly via AST.

boonious commented 3 years ago

👋 hello @josevalim, good evening from rainy Britain, I refactored all the logic into Router.Utils and distilled the changes in Plug.Router to just a single line via build_path_head as suggested.

For an identifier with suffix, e.g.:id.json, conn.params and conn.path_params now returning value from a value.json match. But, despite conn.params returning value, there is an issue (last remaining I think) relating to the identifier (id) var in the underpinning idea:

id when binary_part(id, byte_size(id) -  byte_size(".json"), byte_size(".json")) == ".json"

which requires id to capture the entire value.json while it should actually also return value. For example, a test in this PR currently works like this:

get "/9/:bar.json" when bar != "value.json" do
  resp(conn, 200, inspect(bar))
end

however, it should function in this way (guard without .json) according to Plug DSL:

get "/9/:bar.json" when bar != "value" do
  resp(conn, 200, inspect(bar))
end

get "/9/:bar.json" when bar in ["this_value", "that_value"] do
  resp(conn, 200, inspect(bar))
end

This brings a question of how a matcher can be defined given a priori suffix info. For example, /foo/:id.json is parsed into ["foo", {"", ":id", ".json"}]. Can this be used to build matcher in AST, say for /foo/value.json?

# currently
["foo", id] matches ["foo", "value.json"] # id = "value.json"

# needs, where ?? AST includes var `id` and ".json"
["foo", ??] matches ["foo", "value.json"] # id = "value"

This seems impossible (can't id <> ".json" = "value.json") unless I'm missing some matching trick : (

As I writes this, rebinding id in do_match springs to mind 🤔, going to try that.

josevalim commented 3 years ago

Yup, rebinding first thing in do_match is the way to go!

boonious commented 3 years ago

Hello @josevalim here it is! A working version at last with latest commits addressing what I hope to be the remaining thorny issues: 1) rebinding variables in do_match 2) injecting suffix into var guard values at compile-time, e.g.:

get "/9/:bar.json" when bar != "value" # compiled to bar != "value.json"
get "/9/:foo/:bar.json" when bar in ["value1", "value2"] and foo != "bar" # -> bar in ["value1.json", ..]

The suffix value injection is slightly more complicated than expected, given various existing guard functions. For example, the use of byte_size on a suffix var will require adding suffix length to integer instead of appending string. This PR currently limits the scope to comparison operators (:==. :!= etc.), :in. It will raise if other guard functions are used in conjunction with suffix vars.

Have a great week ahead!

kevinlang commented 3 years ago

This is also useful for sites with SEO considerations in URL naming. For example, a job site may want to have the route for jobs for a given programming language be something like /:language-jobs, e.g., /elixir-jobs.

josevalim commented 3 years ago

Apologies, this fell through the cracks. I will be looking into this today.

josevalim commented 3 years ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart: