Kong / kong

🦍 The Cloud-Native API Gateway and AI Gateway.
https://konghq.com/install/#kong-community
Apache License 2.0
38.78k stars 4.77k forks source link

[request] support for routing based on path? #192

Closed DavidTPate closed 9 years ago

DavidTPate commented 9 years ago

Hi there! Been messing with this a little bit as I'm vetting API gateways to use. I really like the idea of the project but I'm running into an issue where my main routing to services is done based upon path instead of Host.

Was wondering if I'm just overlooking the feature or if there has been any talk of the addition of such functionality. Details below.

If for example I had 3 services that I was running (let's call them auth, pizza, and tacos).

As best as I understand it Kong is setup to handle this where each of these services might be available internally at: auth.internaldomain.com, pizza.internaldomain.com, tacos.internaldomain.com and the routing is done based upon someone's intention to go to auth.externaldomain.com, pizza.externaldomain.com, or tacos.externaldomain.com.

Instead my setup deals with routing to services based upon a part of the path. So, the equivalent calls would be made to api.externaldomain.com/auth, api.externaldomain.com/pizza, or api.externaldomain.com/tacos.

From looking through the docs this didn't seem like a feature that currently exists, love to hear your thoughts on it.

PierreKircher commented 9 years ago

+1 simply for the cost of ssl certs

montanaflynn commented 9 years ago

Great idea! I've added a request label, maybe @thefosk or @thibaultCha could chime in on the implementation details as I don't think this is something that could be added as a plugin since the routing is handled by the Kong core.

subnetmarco commented 9 years ago

This is a feature I have been thinking for a while, and I really like the idea. Will draft out a simple implementation.

PierreKircher commented 9 years ago

proxy is a good aproach i guess we can have them standalone next to each other

like

api1.example.com api2.example.com api3.example.com

and after that we setup a new subdomain and add the "api id" + a location folder

not sure if its that trivial but that would not change the way the actual routing works instead it acts as a compliment layer

just my 2 cents here .. please ignore if im beeing missleading with a simplistic aproach

rafeequl commented 9 years ago

:+1:

I've been thinking of this feature for making an abstraction layer between consumer and internal API.

DavidTPate commented 9 years ago

Glad to hear that you guys think it is a useful feature. Another thing to keep in mind is that this would also likely be used for micro-services and splitting up functionality.

So keeping with my previous example and just focusing on auth service available at: api.externaldomain.com/auth. I would have a route setup to send the path /auth to the server at auth.internaldomain.com.

if I noticed that my login functionality was taking lots of hits and causing me to scale even though the rest of my end-points weren't experiencing much load I would split this up to do something like the following. The path api.externaldomain.com/auth/login would be sent to login.internaldomain.com and api.externaldomain.com/auth would remain sending to auth.internaldomain.com.

I think this comes down to just the fact that order or specificity matters. I think order (where first match is the one that matters) is the more common approach due to it being simpler to implement. So in my example the order of my APIs would be something like this:

thibaultcha commented 9 years ago

I like the idea :+1:

Your latest example (with an order of priority to define the routing) is something that we are used to see in code, but I am curious as to what it would look like when configuring it with simple API calls... A priority value maybe?

DavidTPate commented 9 years ago

As I've been going through trying out other API gateways I thought Tyk had some easy to understand options that were useful even in my simple testing. For the configuration of their Proxy piece they have a few simple options which I think would serve as a good starting point.

DavidTPate commented 9 years ago

@thibaultCha Yeah, it's a weird one when dealing with an API. You obviously don't want to go by creation date, or update the entire set of APIs. The way that AWS does it for things like Network ACLs which I think is simple is that they have you put in a kind of sort order value. So keeping with my previous example:

Rule Number Path Target
100 /auth/login login.internaldomain.com
200 /auth auth.internaldomain.com
300 /pizza pizza.internaldomain.com
400 /tacos tacos.internaldomain.com

I think it clearly shows the order (you would just need to make sure Rule Number is unique when sent), and provides an easy for me to insert something between /pizza and /tacos by simply giving it a rule number such as 350.

thibaultcha commented 9 years ago

On an other note, maybe the resolvers too could be plugins. We call the current resolver (by Host header) a core-plugin, since it is identifying an API in the system from a user's request. One resolver could work with the Host header, and another with paths.

Btw, it is worth noting that this is also close to how NGINX natively supports proxying, even if less flexible than what you are describing @DavidTPate:

location /match/this {
    proxy_pass http://example.com/;
}
# A request sent to `/match/this/auth` will be sent to upstream as `/match/this/auth`

location /match/this {
    proxy_pass http://example.com/new/path;
}
# A request sent to `/match/this/auth` will be sent to upstream as `/new/path/auth`
thibaultcha commented 9 years ago

I was thinking about such a rule @DavidTPate. Here the interval between two routes is huge but to be 100% future proof, if one decides to insert a route at a value that already exists we could also "push" all values after the one being inserted.

I don't see another solution right now than this, but it seems decent. An UI can easily and nicely deal with this on top of the admin API.

steinnes commented 9 years ago

I really like this idea. We at QuizUp developed our own nginx based routing solution where we route based on the request path (location).

Additionally, something I would be very interested in seeing (and potentially developing in this project) is support for registering backend nodes directly with Kong. This is to avoid having to rely on (potentially stale) DNS records for discovering backends. Having a complete list of backends (internal IPs usually) per microservice would also allow for more sophisticated load balancing algorithms to be employed (I am thinking least connection, and since there is the possibility of sharing state via cassandra, this makes a lot of sense to me!).

Awesome project btw :-)

thibaultcha commented 9 years ago

@steinnes Could what you described somewhat be related to #157? Load balancing your API with Kong?

steinnes commented 9 years ago

Absolutely similar. The #157 issue seems quite focused on how nginx does this, but I assume we could do this in two ways.

*1. Deltas (ie. add/rem particular backend/upstream from an api):

curl -XPOST --url http://localhost:8001/apis/backends/add \
 --data 'name=mockbin' \
 --data 'upstream=10.0.0.99:1234' 

or

curl -XPOST --url http://localhost:8001/apis/backends/rem \
 --data 'name=mockbin' \
 --data 'upstream=10.0.0.99:1234' 

*2. Complete overwriting of upstreams (basically a "set" operation):

curl -XPOST --url http://localhost:8001/apis/backends/set \
 --data 'name=mockbin' \
 --data 'upstreams=10.0.0.99:1234,10.0.0.88:4321,10.0.0.77:5678' 

Or whatever makes most sense. I just came across your project and immediately decided to start suggesting stuff -- but in my defence, if you guys like the ideas I wouldn't mind contributing :-)

tamizhgeek commented 9 years ago

:+1: We are using a home-built nginx routing to different upstreams based on the request path in the API. We replace the proxy_pass using a variable after matching the path pattern in a regex.

This will also help in having endpoint level rate limiting/throttling.

Would love to have this in kong. Will make our migration to kong much easier!

sonicaghi commented 9 years ago

+1

drabiter commented 9 years ago

+1

montanaflynn commented 9 years ago

+1

Here's a real example of where I do routing based on endpoints in nginx:

server {
  server_name img.apistatus.org;
  location /online {
    proxy_pass http://127.0.0.1:4445/;
  }
  location /status {
    proxy_pass http://127.0.0.1:4446/;
  }
}
thibaultcha commented 9 years ago

Before implementing this, a quick follow-up to see what we think about it.

We currently have a resolver that we can call the "host resolver". I shall refer to the resolver described by @DavidTPate as the "path resolver".

On the usability side:

It would be nice to be able to configure an API wether its routing should happen by host or by path. Say the API now has 2 properties: public_dns (for the host resolver) and path (for the path resolver).

My 2 cents: separate those resolvers but keep them bundled into the core. Have an API use one or the other depending on which property is set. Refuse an API that uses both.

On the implementation side:

A nice solution would be to now separate the properties used by the resolvers and the APIs:

@thefosk @montanaflynn thoughts?

DavidTPate commented 9 years ago

That sounds like a good solution to me. I could see someone attempting to use both a "host resolver" and a "path resolver" but to me that screams of poor API design and I'm honestly not sure if even Nginx has the ability to do both (without duplication of configuration).

montanaflynn commented 9 years ago

@thibaultCha I would say to allow for both with only one or the other being required.

This way you can set up multiple APIs in one Kong install and still handle all the above use cases that @DavidTPate described. The ordering for how Kong would pick which one could be like this:

  1. Matches both host and path
  2. Matches host
  3. Matches path

There's a good answer on stackoverflow about how nginx handles prioritizing paths.

Here's a bigger snippet of the nginx config I put above showing how I'm matching by host & path. Two paths are the same but lead to different outcomes dependent on the host. You'll also notice that I'm using regex in the path which is something that we should consider as well.

server {

  # Matches this host
  server_name img.apistatus.org;

  # And this path
  location /online {
    proxy_pass http://127.0.0.1:4445/apistatus/online;
  }

  # Or this path
  location /status {
    proxy_pass http://127.0.0.1:4446/apistatus/status;
  }

  # Or this path
  location /robots.txt {
    return 200 "User-agent: *\nAllow: /";
  }

}

server {

  # Matches this host
  server_name apistatus.org;

  # this path matches if none of the others do
  location / {
    root /usr/share/nginx/www/apistatus;
    index index.html;
  }

  # Or this path which is also defined above
  location /robots.txt {
    return 200 "User-agent: *\nDisallow: /";
  }

  # Or this path which uses regex
  location ~* \.(gif|jpg|jpeg)$ {
    rewrite ^/images/(.*)(png|jpg|gif)$ http://127.0.0.1:4447/images/$1$2 redirect;
    return 302;
  }

}
subnetmarco commented 9 years ago

Matches both host and path Matches host Matches path

@montanaflynn I agree with this.

melihmucuk commented 9 years ago

+1

alexkrauss commented 9 years ago

+1

rosskukulinski commented 9 years ago

this feature would be a requirement for us to adopt kong. (+1)

subnetmarco commented 9 years ago

I just would like to tell that this feature is coming in the 0.3.0 release about 3/4 weeks from today.

DavidTPate commented 9 years ago

@thefosk Thanks for the quick response :+1:

sonicaghi commented 9 years ago

:tada:

thibaultcha commented 9 years ago

There are a lot of things we need to figure out before implementing this.

Requirements

  1. API can be matched by Host.
  2. API can be matched by Path.
  3. API can be matched by Host + Path (higher priority over 1 and 2).
  4. Paths should allow regexes.
  5. An API can have 1 Host (as of now, assuming we're not changing that here).
  6. An API can have multiple Paths, with a prioritisation system.
  7. A Path can have a strip property (ignored here).

Schema 1

Considering this, and the way we want to query Cassandra, all that by keeping our RESTful configuration capabilities, this is a potential model (and the most valid I could think of):

CREATE TABLE apis(
  id uuid,
  name text,
  PRIMARY KEY(id)
);

CREATE TABLE hosts(
  id uuid,
  api_id uuid, -- foreign to apis.id
  public_dns text,
  target_url text,
  PRIMARY KEY(id)
);

CREATE TABLE paths(
  id uuid,
  api_id uuid, -- foreign to apis.id
  host_id uuid, -- useful to require this Path to first match a Host (rule 3)
  listen_path text,
  priority int,
  target_url text,
  PRIMARY KEY(id, priority) -- priority allows us to ORDER BY a query, but I would probably rather do that in the application level
);

CREATE INDEX ON hosts(public_dns);
CREATE INDEX ON paths(listen_path);

This schema allows us to follow the requirements:

  1. Query by Host and find (or not) an API
  2. Query by Path and find (or not) an API
  3. If a Path was found, check if it has a host_id
    • 3a If it has a host_id and it matches the one of the previously found Host(s), Path is valid -> redirect
    • 3b If it doesn't have a host_id or it doesn't match the previously found Host -> next
  4. If a Host was found -> redirect
  5. If nothing happened at this point -> drop

The problems here are:

  1. We are making 2 queries to the DB per "non-cached-call" (one for Host and one for Path). This will be slower but we do have a database cache, so not significantly slower either.
  2. If we want an API to have multiple Paths (/path1, /path1/overriden), this schema will force us to query all Paths from the DB to be able to compare them with the current URI. That is actually the case for all models except the presented schema 2.
  3. If we want to support regexes in Paths, same: we need to query them all.
  4. Standard foreign relations issues (not major).
    • 4a If a Host is deleted, we need to update all the Paths having it as a host_id.
    • 4b If an API is deleted, delete all related Hosts and Paths

Proposed workarounds

Schema 2

I also considered such a schema:

CREATE TYPE path(
  listen_path text,
  priority int,
  host text,
  target_url text
);

CREATE TYPE host(
  public_dns text,
  target_url text
);

CREATE TABLE apis(
  id uuid,
  name text,
  host host, -- one Host
  paths frozen<set<path>>, -- multiple Paths
  PRIMARY KEY(id)
);

But it arises more concerning problems:


From here, I see 2 solutions:

Migrations

Finally, another problem to consider is that almost any schema change will require a heavy migration. By heavy I mean moving data around, possibly by providing a script or something to migrate from the current apis table to any of the newly created tables. That means our migrations will not be able to do the job. We need something that:

  1. Creates the new tables
  2. Move the data around
  3. Delete the old tables

or

  1. Create a new schema in a new instance
  2. Move the data from the old instance
  3. Reload Kong

All that should be done with users doing a backup of their data first. Kong is not 1.0 yet so I don't see handling that as a priority. Users should expect having to reconfigure their APIs if they want to upgrade.

subnetmarco commented 9 years ago

I have some feedback and questions.

SELECT id FROM apis WHERE public_dns = ? OR listen_path = ?;

or (if possible in Cassandra):

SELECT id FROM apis WHERE public_dns CONTAINS ? OR listen_path CONTAINS ?;

Not sure if there is any limitation with Cassandra if we do this.

thibaultcha commented 9 years ago

we could also allow multiple Host

Yeah I thought about it, but it brings a lot of configurations headaches, because one could have 2 hosts, 2 paths that only validates if Path A + Host A, and Path B + Host B, but it can be extremely confusing very fast. But having 1 Host and X Paths, we respect the nginx behaviour as showed in the examples in this thread. I think it does more harm than anything.

I think it's important to support regular expression

See my point about supporting it: it means everything will have to be in memory and the routing will be O(n), because we need to compare a path against every configured Path. Also your example is a Host? Even if we support 1 Host per API, same, we would need to have every Host in memory too. (See the conclusion about that)

Regarding the schemas, can't we just add one more field to apis and query it like:

Cassandra does not have support for such an OR.

Regarding the priority, I would say to remove it in the first implementation if that will allows us to use a simpler schema.

If we drop this we absolutely cannot have multiple Paths per API like @DavidTPate described it (/auth/login and /auth would overlap). Ex: if one sets a listen_path to: /pizza/, another to /pizza/hello and queries /pizza/hello/world, which listen_path gets applied? We can't know without a priority value, or having them ordered as an array.

Our problems are:

  1. Supporting both Host and Path by default for all APIs: double cassandra querying
  2. Supporting regex in Path or Host: everything will need to be in memory, O(n).
  3. Supporting multiple Paths per API: see the example above. We do need a priority property.
  4. Supporting a Path with multiple parts: if one sets a listen_path to: /pizza/api and queries /pizza/api/hello/world. From the code's POV, am I supposed to query Cassandra with /pizza/, /pizza/api/, /pizza/api/hello/ or /pizza/api/hello/world/. That is why we need to support something like "starts with or strict" modes, or just regex in a first version.

To conclude, if we want to stick with those requirements, and fix 1, 2, 3, 4, I think we have no choice but to load the Host(s) (plural if we decide to support many Hosts, but that brings configuration concerns as mentioned) and Paths in memory. And somehow reload them when they get modified. After all, it is what nginx is doing too, except you don't expect a configuration file to have tens of APIs, where you can expect Kong to have such a number. Schema 2 or equivalent would be valid in that case.

sonicaghi commented 9 years ago

Keep one host for this version. Simpler.

subnetmarco commented 9 years ago

Just brainstorming here, but another options would be having only one property called matchers or patterns (or a better name) that contains both the DNS or the path. The table would look like:

CREATE TABLE IF NOT EXISTS apis(
  id uuid,
  name text,
  matchers set<text>,
  target_url text,
  created_at timestamp,
  PRIMARY KEY (id)
);

We could support multiple DNS and multiple paths in one field:

SELECT * FROM apis WHERE matchers CONTAINS 'something.com' ALLOW FILTERING;

or

SELECT * FROM apis WHERE matchers CONTAINS '/hello/world' ALLOW FILTERING;

This won't fix the two-queries problem because I think Cassandra doesn't support SELECT statements to search for multiple values in a field (I might be wrong):

SELECT * FROM apis WHERE matchers IN ('something.com', '/hello/world') ALLOW FILTERING;
thibaultcha commented 9 years ago

First implementation drafted in #282. It only supports 1 path per API. Since supporting all the requested features means a lot of rewritten code, I opted for breaking down the implementation in 2 parts:

thibaultcha commented 9 years ago

Closing this and adding support for multiple path/multiple hosts in one of the upcoming releases. Thank you all!