ThreeMammals / Ocelot

.NET API Gateway
https://www.nuget.org/packages/Ocelot
MIT License
8.28k stars 1.63k forks source link

#849 #1496 Extend the route key format used for load balancing making it unique #1944

Closed sliekens closed 5 months ago

sliekens commented 6 months ago

Fixes #849 #1496

Proposed Changes

Multiple routes with identical UpstreamMethod and UpstreamPath, but different UpstreamHost and ServiceName or DownstreamHostAndPort will now be load-balanced correctly.

raman-m commented 6 months ago

@ggnaegi Welcome to code review!

ggnaegi commented 6 months ago

@ggnaegi Welcome to code review!

@raman-m yes but a I said i will have some time on Wednesday night.

sliekens commented 6 months ago

Steven, Your proposal for the key format is

UpstreamHttpMethod,+|UpstreamHost|UpstreamPathTemplate|ServiceNamespace|ServiceName
OR
UpstreamHttpMethod,+|UpstreamHost|UpstreamPathTemplate|Host:Port,+

Maybe to include Type of Load Balancer for the second variant instead of Host:Port? Because UpstreamPathTemplate cannot vary, so it is unique, seems Host:Port is redundant... and type of load balancer cannot vary too...


I thought about it, what I want to do is make the route key format the same for all types of configs, because the differences in the format are causing me confusion.

Proposed format, which now includes the load balancer options instead of downstream host/port:

# sticky sessions
CookieStickySessions:LoadBalancerOptions.Key

# everything else
UpstreamHttpMethod|UpstreamPathTemplate|UpstreamHost|ServiceNamespace|ServiceName|LoadBalancerOptions.Type|LoadBalancerOptions.Key

However, for this to work correctly, empty parts must be kept, because removing empty parts will again lead to ambiguous route keys.

For example:

GET|/categories|my-api|LeastConnection

It's not clear if this means my-api is the UpstreamHost or ServiceName. The empty parts must be kept to solve this ambiguity.

sliekens commented 6 months ago

I had to make a few more changes based on new information and test results. The route key is now calculated uniformly for all kinds of load balancers (except sticky session load balancing which uses static keys.)

The format is now:

UpstreamHttpMethod|UpstreamPathTemplate|UpstreamHost|DownstreamHostAndPort|ServiceNamespace|ServiceName|LoadBalancerType|LoadBalancerKey

I had to keep the DownstreamHostAndPort because a test started failing without it. Route configuration can be changed during runtime when Ocelot.Administration is used, so downstream host and ports must be included in the route key. A better solution would be to clear the LoadBalancerHouse when the route configuration is changed, but I think that is out of scope for this PR.

I added LoadBalancerType and LoadBalancerKey as suggested.

For undefined route properties, the undefined parts are replaced by default values:

-GET,POST,PUT|/api/product||10.0.0.1:8080||||
+GET,POST,PUT|/api/product|no-host|10.0.0.1:8080|no-svc-ns|no-svc-name|no-lb-type|no-lb-key

The default values don't serve any technical purpose, but it came up in the review that empty parts look like a mistake when it's actually not.

raman-m commented 6 months ago

@sliekens added 4 commits on Jan 28

Thanks for issues resolving and working on code review more! I'm busy these days... and sorry, I'll review after the release we have this week...

sliekens commented 6 months ago

Let me know if you have any more suggestions. Otherwise, I'm happy with the current state of this branch.

sliekens commented 6 months ago

@raman-m do you have time this week to review?

sliekens commented 6 months ago

@raman-m I added a test with two services, each with a different UpstreamHost and ServiceName and response body.

I have verified that it fails as expected on the develop branch, before my changes:

(Ocelot reuses the route of the first service, load balancing is not working correctly)

image

When I run the test on this branch, it passes.

Additional info:

As before, let me know if you have any more questions or suggestions, otherwise, I'm happy with these changes as is.

raman-m commented 5 months ago

@sliekens commented on Feb 16

Thank you! Good job! 💪 Having acceptance tests is a green light, and tomorrow I'll perform final code review with possible a few small commits checking styling & conventions, minor code improvements... Also, I hope it will be merged tomorrow too 😉

raman-m commented 5 months ago

Questionz

sliekens commented 5 months ago

I did all of my testing with the LeastConnection load balancer. However I want to emphasize that the bug was in the LoadBalancerHouse, so the issue should present itself with any load balancer, probably including the NoLoadBalancer implementation of ILoadBalancer.

raman-m commented 5 months ago

so the issue should present itself with any load balancer

I've converted the acceptance fact to theory: https://github.com/sliekens/Ocelot/blob/bugfix/route-keys/test/Ocelot.AcceptanceTests/ServiceDiscoveryTests.cs#L461-L468 And now it also passes. So, the theory is 🟢 🤔 I expected that at least one "CookieStickySessions" test fails, but it is green too. Could you explain me why the theory works for all types of load balancers?

Also, in debug session I noticed that key is build always without reading from the cache... Seems the acceptance test scenario should be longer! It must contain more steps! Otherwise it may be useless.


I have verified that it fails as expected on the develop branch, before my changes:

(Ocelot reuses the route of the first service, load balancing is not working correctly) image

I hope that is correct.

raman-m commented 5 months ago

Based on docs in case of Upstream Host in a route The quote:

This works by looking at the Host header the client has used and then using this as part of the information we use to identify a Route.

and

The Route above will only be matched when the Host header value is somedomain.com.

and

If you do not set UpstreamHost on a Route then any Host header will match it.

Question: How do you setup the header in acceptance tests? Is it added implicitly?

sliekens commented 5 months ago

Could you explain me why the theory works for all types of load balancers?

It's important to understand that the LoadBalancerHouse was causing the problem, the actual load balancers don't use the route key. So, the issue is the same for all types of load balancers, including sticky session.

You can copy the acceptance test to the develop branch and run it to see it fail. If you put a breakpoint in the LoadBalancerHouse, you should see the problem when calling the second service.

Also, in debug session I noticed that key is build always without reading from the cache... Seems the acceptance test scenario should be longer! It must contain more steps!

That is intended. The cache should not be used when calling two different services exactly once. I could add a step to call the same service more than once, which then uses the cache instead of making another call to Consul. However, I felt that it is already sufficiently covered by other existing tests.

sliekens commented 5 months ago

Question: How do you setup the header in acceptance tests? Is it added implicitly?

Yep, it is added implicitly, when you do this:

_ocelotClient.GetAsync("http://us-shop/products");

It is translated to a request like:

GET /products HTTP/1.1
Host: us-shop
raman-m commented 5 months ago

Thanks for the explanation! It's much clearer now.


_ocelotClient.GetAsync("http://us-shop/products");

It is translated to a request like:

GET /products HTTP/1.1
Host: us-shop

But you don't call http://us-shop/products in the test, but http://us-shop only. Also in the lines https://github.com/sliekens/Ocelot/blob/bugfix/route-keys/test/Ocelot.AcceptanceTests/ServiceDiscoveryTests.cs#L521-L531 ServiceName has value product-us, but not products. My expectation that ServiceName should be the same as products or product, but they differ: product-us and product-eu... Is my assumption wrong? I see that you use two Consul service entries... Or in your scenario must Consul have 2 entries? So, I'm confused by different ServiceNames in both routes...

raman-m commented 5 months ago

That is intended.

Intended not to load balancer? )) But this is the test about load balancing 🤣


The cache should not be used when calling two different services exactly once.

Clear!


I could add a step to call the same service more than once, which then uses the cache instead of making another call to Consul.

Yes, please make the test complete.


However, I felt that it is already sufficiently covered by other existing tests.

Other acceptance tests even use counter approach to check times of calling! Moreover, even your test regarding UpstreamHost should check the number of calls via counter. The Counter approach is provided by 2 helpers:

private void GivenProductServiceOneIsRunning(string url, int statusCode)
private void GivenProductServiceTwoIsRunning(string url, int statusCode)
sliekens commented 5 months ago

My expectation that ServiceName should be the same as products or product, but they differ: product-us and product-eu... Is my assumption wrong? I see that you use two Consul service entries... Or in your scenario must Consul have 2 entries? So, I'm confused by different ServiceNames in both routes...

The setup is: 2 different product services, with different names, and 2 corresponding Consul entries. Requests to Ocelot with Host: us-shop should re-route to product-us while Host: eu-shop should re-route to product-eu.

image

image

There must be no load balancing between product-us and product-eu because they return different results (e.g. phone chargers with different plugs for US and EU markets).

The bug in develop branch is that Host: us-shop and Host: eu-shop are both routed to product-us (or product-eu, depending which route was cached first) because the routes, although they are different, have the same route key.

PS: I originally said that calls to Consul are cached, but that is false. Only responses from ILoadBalancerFactory are cached. The created ILoadBalancer still calls Consul each time. I think this is the intended design and correct.

sliekens commented 5 months ago

Intended not to load balancer? )) But this is the test about load balancing 🤣

@raman-m with the explanation above, do you still feel the test is incomplete? While it is true that the bug is in the LoadBalancerHouse, it actually has very little to do with load balancing, which is known and proven to work by other existing tests.

/edit: I added a Consul request counter and repetitive Ocelot calls, to verify load balancing and service discovery are working correctly. I hope these changes are in line with your expectations, otherwise don't hesitate to comment.

raman-m commented 5 months ago

@sliekens commented on Feb 28

I love this explanation! ❤️ 😉 Thanks!


PS: I originally said that calls to Consul are cached, but that is false. Only responses from ILoadBalancerFactory are cached.

Correct!


The created ILoadBalancer still calls Consul each time. I think this is the intended design and correct.

Correct! Default implicit type of poller is Ocelot.Provider.Consul.Consul which behaves as transient service calling Consul each request. There is another PollConsul type which has caching mechanism.

raman-m commented 5 months ago

@sliekens wrote

@raman-m with the explanation above, do you still feel the test is incomplete? While it is true that the bug is in the LoadBalancerHouse, it actually has very little to do with load balancing, which is known and proven to work by other existing tests.

After your adding last 4 commits the test is complete now! It looks great! 😍 Regarding LoadBalancerHouse class... OK, we 've fixed it. But yesterday I noticed one Clean Code issue aka code smell related to SOLID vs DRY principles. Going to fix it as the last commit of the PR...


/edit: I added a Consul request counter and repetitive Ocelot calls, to verify load balancing and service discovery are working correctly. I hope these changes are in line with your expectations, otherwise don't hesitate to comment.

Now the acceptance test is awesome thing! It shows most complete user scenario which can be described for UpstreamHost case. Thanks for refactoring the test! 🤩

Finally, I'd say: we have Done development! Thank you for your great contribution! 😋