CrowdHailer / raxx

Interface for HTTP webservers, frameworks and clients
https://hexdocs.pm/raxx
Apache License 2.0
401 stars 29 forks source link

Add middleware proof of concept #136

Closed nietaki closed 5 years ago

nietaki commented 5 years ago

There have been requests for middleware-style functionality already: https://github.com/CrowdHailer/raxx/issues/83, https://github.com/CrowdHailer/raxx/issues/85 plus numerous in the slack channel. There's also a good amount of features that would be best implemented with middleware instead of bundling them into individual controllers (CORS, HEAD -> GET translation, CSRF, logging, ...).

This issue is about designing a proof of concept middleware interface that could be used in a Raxx-based application.

What is a middleware (in my mind)

It's a component that sits between the server and the controller controller, which can modify the request before giving it to the controller and modify the controllers response before it's given to the server.

Oftentimes a middleware (or a pipeline of multiple middlewares) would be attached to a group of routes in a router to modify their behaviour "in batch".

Design criteria

The middleware should:

Non goals:

Remaining questions

nietaki commented 5 years ago

I'll work on an implementation :sunglasses:

nietaki commented 5 years ago

I played around with some prototypes today and, as always, there's a number of different approaches with different trade-offs. I'll try to show them by example and discuss how I see their properties.

All examples show a snippet of the implementation of a custom middleware, which modifies the headers and passes it further down to subsequent middlewares in the "pipeline".

1. "Hidden State"

This is the approach I was originally aiming for, until I realised it's a bit inconvenient to achieve...

defmodule MyMiddleware1 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware
  def handle_head(request, _state, head_continuation) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    next = head_continuation.(request)
    # next :: [Raxx.part()] | Raxx.Response.t()

    {next, my_state}
  end

  # (other callbacks omitted)
end

Advantages:

Disadvantages:

2. "Pipeline list"

Here Raxx.Middleware would define the pipeline type:

   # (inside Raxx.Middleware)
  # the last `module` is a controller module implementing Raxx.Server, all
  # the other ones implement Raxx.Middleware
  @type pipeline :: [{module, state :: any()}]
defmodule MyMiddleware2 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware

  def handle_head(request, _state, remaining_pipeline) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    {next, remaining_pipeline} = Middleware.handle_head(request, remaining_pipeline)
    # next :: [Raxx.part()] | Raxx.Response.t()

    {next, my_state, remaining_pipeline}
    # OR, alternatively
    {next, [{__MODULE__, my_state} | remaining_pipeline]}
  end

  # (other callbacks omitted)
end

Advantages:

Disadvantages:

3. "Split callbacks"

That's the simplest approach to come up with, but I'd argue it's not particularly elegant.

defmodule MyMiddleware3 do
  alias Raxx.Middleware

  defstruct []

  @behaviour Middleware

  @impl Middleware

  def before_handle_head(request, _state) do
    my_state = %__MODULE__{}
    request = Raxx.set_header(request, "content-type", "application/json")

    {:continue, request, my_state}
    # or {:halt, request, my_state} to not call through to remaining middlewares/controller
  end

  # `next` is what came back from subsequent middlewares/controller
  # will be nil if before_* returned a {:halt, _, _} tuple
  def after_handle_head(request, state, next)
    {next, my_state}
  end

  # (other callbacks omitted)
end

Advantages

Disadvantages

My take

After looking at it from different sides I like approach 2 (pipeline list) the most. I think it's going to yield nice implementation and a simple mental model while still being relatively difficult to use wrong.

I'll focus on implementing that one right now.

nietaki commented 5 years ago

Also, I remember some discussions about how performant it is to call functions on modules stored in variables which might happen quite a lot when implementing this. I did a quick and dirty benchmark to make sure it wasn't terrible:

  test "calling modules via functions" do
    list = Enum.to_list(1..10)

    enum = Enum
    lambda = fn l -> Enum.reverse(l) end

    Benchee.run(%{
      "directly" => fn -> Enum.reverse(list) end,
      "module in a variable" => fn -> enum.reverse(list) end,
      "apply on a variable" => fn -> apply(enum, :reverse, [list]) end,
      "apply on a Module" => fn -> apply(Enum, :reverse, [list]) end,
      "lambda" => fn -> lambda.(list) end
    })
  end

Results:

Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 30.96 GB
Elixir 1.7.3
Erlang 21.0.9

(some warnings and information cut)

Name                           ips        average  deviation         median         99th %
apply on a Module          22.64 M      0.0442 μs    ±50.94%      0.0410 μs      0.0870 μs
module in a variable       22.40 M      0.0446 μs    ±25.09%      0.0420 μs      0.0880 μs
directly                   22.04 M      0.0454 μs    ±31.05%      0.0420 μs      0.0920 μs
apply on a variable        20.98 M      0.0477 μs   ±734.85%      0.0400 μs       0.120 μs
lambda                     17.95 M      0.0557 μs    ±23.71%      0.0530 μs       0.106 μs

Comparison: 
apply on a Module          22.64 M
module in a variable       22.40 M - 1.01x slower
directly                   22.04 M - 1.03x slower
apply on a variable        20.98 M - 1.08x slower
lambda                     17.95 M - 1.26x slower

(all of those come with the Warning: The function you are trying to benchmark is super fast, making measurements more unreliable! This holds especially true for memory measurements. See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning warnings, but I think it's a good starting point for an intuition anyways.

CrowdHailer commented 5 years ago

After reading your discussion about the problems with version 1, my immediate though was something along the lines of version 2.

Nice metrics, looks to me like everything, except lambda is basically a draw.

nietaki commented 5 years ago

Cool! I have a fully working and partially tested implementation of 2 going, but I want to iron out some API kinks and integrate it with the Router before I share it.

CrowdHailer commented 5 years ago

I had a thought for benchmarks. Code in this PR https://github.com/CrowdHailer/raxx/pull/140/

Operating System: Linux"
CPU Information: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
Number of Available Cores: 4
Available memory: 7.70 GB
Elixir 1.7.1
Erlang 21.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking macro...
Benchmarking stack...

Name            ips        average  deviation         median         99th %
macro      123.47 K        8.10 μs    ±68.84%           8 μs          11 μs
stack       20.52 K       48.72 μs    ±10.47%          48 μs          71 μs

Comparison: 
macro      123.47 K
stack       20.52 K - 6.02x slower

Possibly still not very informative, to get rid of the fast code warning I stacked 1000(!!) middleware

nietaki commented 5 years ago

https://github.com/CrowdHailer/raxx/pull/139 and https://github.com/CrowdHailer/raxx/pull/141 are both experiments of how it could look, let's unify the discussion on the approach here.

nietaki commented 5 years ago

I needed a bit more to think about #141 to digest and formulate what I do and don't like about it. Let me try to split it into topics

Should Ace know if it's serving a stack of middlewares or just a server?

Good point, it probably shouldn't - a stack of middlewares with a vanilla server attached to it should have the same Raxx.Server interface as just a server. Whether the interface should be exposed by the "topmost" middleware of some middleware stack mechanism, is a different thing.

Is MyMiddleware.wrap({MyServer, :my_server_state}) a sufficient interface?

I'd argue it isn't. It only allows you to construct a "pipeline" of middlewares (plus a baked-in server) once, and you can't re-use it anymore. I can imagine situations where we'd want to "concatenate" pipelines, in a similar way you'd do in phx routers - I have a "pipeline" of middlewares, that based on the request I want to put before a server. The wrap interface doesn't allow for that, or anything else, really.

Is it important or even viable that a single Middleware is a valid Server or the other way around?

I don't think so. A middleware should always have the option to hand the request further down the "stack", but a server never would.

Side note: application-specific responses

After some things that I've been trying to do with Phoenix/Plug recently, I think it would be cool/useful if the Server could return something other than a Raxx.part to it's middlewares, for an application specific action. In order for that to work the middleware infrastructure would either need to not modify the response parts at all or pass through some specifically tagged ones (like {:custom, %MyAppError{}})

A proposed approach coming in the next comment ;)

nietaki commented 5 years ago

Ok, here we go:

We're going to have 3 main concepts here. It might seem like it complicates things, but I'd argue it allows us to have a clean division, more straightforward api and a simpler programming experience.

Middleware is just a single middleware that can modify requests/responses. It provides a behaviour for the individual middlewares to implement. This is slightly modified code from my PR:

defmodule Raxx.Middleware do
  @type t :: {module, state}

  @type state :: any()

  @type next :: {[Raxx.part()], state, Server.t()}

  @callback process_head(Raxx.Request.t(), state(), inner_server :: Server.t()) ::
              next()

  @callback process_data(binary(), state(), inner_server :: Server.t()) :: next()

  @callback process_tail([{binary(), binary()}], state(), inner_server :: Server.t()) ::
              next()

  @callback process_info(any(), state(), inner_server :: Server.t()) :: next()
end

As you can see, it has the capability to wrap any Server and modify requests going to it and its responses.

Then we have the Pipeline. Pipeline is a collection of Middlewares that you can re-use and compose however you want, because it's just a list of them.

defmodule Raxx.Middleware.Pipeline do
   @type t :: [Raxx.Middleware.t()]
end

In the end we have a Stack, which is a Pipeline connected to a Server. The Stack implements the Raxx.Server interface, so it can be used anywhere a Server can be used. You can also easily swap out the pipeline or the server it's attached to:

defmodule Raxx.Stack do
  defstruct [:pipeline, :server]

  @behaviour Raxx.Server

  @type t :: %__MODULE__{
    pipeline: Raxx.Pipeline.t(),
    server: Raxx.Server.t(),
  }

  # all Raxx.Server callbacks implemented here to route through the pipeline and the server and maintain their states correctly
end

The way you implement a Middleware is very simple and doesn't require you to learn any special interfaces. It uses just the Server helper functions from your PR: https://github.com/CrowdHailer/raxx/pull/141/files#diff-2095b50d041367f92320505f03fe049cR217

defmodule Raxx.Middleware.Head do
  alias Raxx.Server

  @behaviour Raxx.Middleware

  def process_head(request = %{method: :HEAD}, state, inner_server) do
    request = %{request | method: :GET}
    {parts, inner_server} = Server.handle_head(server, request)
    parts = strip_body(parts)
   {parts, state, inner_server}
end

In practice most of the time the inner_server would be the Stack with all the Middlewares up to the current one stripped away, but it wouldn't make a difference to the current Middleware.

The exact order of the arguments is up for debate.

Biased summary

So in the end we seem to be arriving at a solution that has it all:

I skipped all of the implementation, but I think it would be very simple - let me know if you have any questions.

nietaki commented 5 years ago

An implementation of the above now in https://github.com/CrowdHailer/raxx/pull/139