Closed shashanksapre closed 1 year ago
Hello @shashanksapre,
thank you for reaching out. From looking at the numbers that you've posted, I am not able to conclude that Kong 2.8.3 would be generally faster than 3.1.1. The variances in your measurement result are too large to warrant such a conclusion. In order to make comparisons, you would first need to be able to create results that don't vary by a order of magnitude. Also, you are reaching out to an external service (httpbin) in your test. While there is buffering between the proxy path and the sending of data to the http-log upstream, using external service when assessing Kong's performance is bound to create results that are difficult to compare and analyze.
We're continuously monitoring the performance of our releases, and we have not noticed any dramatic differences between the 2.8 and the 3.x lines. If anything, 3.x has become faster, but it really depends on the plugins that you use whether you can observe such speed improvements.
I would recommend that you perform your testing in an isolated environment on dedidcated hardware and with no external service dependencies. Once you're able to generate stable performance numbers for Kong without plugins, try measuring with one or the other plugins added.
I am not trying to say that it is impossible that Kong or one of its plugins creates a performance issue in your environment, but to investigate that any further, we'll need to see better measurements.
Kind regards, Hans
Hello @hanshuebner, thank you for your response.
We tried running kong in an isolated environment.
Creating two services to route requests to its own admin api. One using localhost, and another using the kubernetes service name.
- name: admin-v1
url: http://kong-kong-admin.default.svc.cluster.local:8001/
routes:
- name: admin-v1
paths:
- /admin/v1
- name: admin-v2
url: http://localhost:8001/
routes:
- name: admin-v2
paths:
- /admin/v2
Comparing just the kong part of the latency between the two versions for both services we come to the same result. The kong part of latency is 10 times in 3.x as that in 2.8.x when using the kubernetes service name and almost same when using localhost. Using the same configuration across the versions.
And we also have production data of last 4 months where we just upgraded the kong gateway from 2.8.x to 3.x and noticed the issue first hand.
Hi @shashanksapre,
thank you for providing more details. With the graphs, the problem seems clearer. Did you attempt to isolate the problem to either of the two plugins (rate-limiting and http-log) that you use?
Thank you, Hans
From the observation it has something to do with dns resolution since putting localhost/ip in 3.x improves the results.
Hi @shashanksapre,
Thanks for all these details. Did you compare the impact of DNS resolution between 2.8.x and 3.x? Using IP addresses makes 3.x perfomance similar to 2.8.x?
Hi @shashanksapre,
Thanks for all these details. Did you compare the impact of DNS resolution between 2.8.x and 3.x? Using IP addresses makes 3.x perfomance similar to 2.8.x?
Yes, that's correct. When we put hostnames, regardless of whether it's an internal or external service, the kong latency is higher in 3.x. when it's IP, performance is similar in both 2.8.x and 3.x
Hello, can we please get an update?
@shashanksapre We have noticed your issue report and are working on it. We cannot give you a firm date when we will have a solution. If you require well-defined response times for your issues, please check out the enterprise version of Kong Gateway https://konghq.com/products/api-gateway-platform
@bungle digged out this commit https://github.com/Kong/kong/commit/3b721ac034378614f65ec2106211e6459c148896 which changed the caching defaults of Kong's DNS client. This first became part of the 3.0 release, so it might be related to the issue that you're seeing. @shashanksapre Would you be able to change the default as seen in that commit from true
to false
and measure whether that solves the issue for you?
Thanks, Hans
@hanshuebner I am unable to see any way to set this using the kong charts. Can you please tell where we can set this.
@shashanksapre It would require a source-level modification to Kong. If you're strictly running off release images, that won't be an option. We're still investigating and may eventually be able to reproduce the issue ourselves, though.
Facing a similar issue with 3.1 after upgrading for 2.8.3 Ran couple of benchmarks to compare numbers across both
1/ with latest kong 3.1 - existing config - no restart - 4.39 error % - 45.6 rps Memory utilisation - 1.6Gb + 2/ with latest kong 3.1 - existing config - with restar - 8.69 error % - 51.4 rps Memory utilisation - 1.1Gb 3/ with kong 2.8.3 - 1G mem limit / CPU HPA - 2.38 error % - 46.2 rps - Memory utilisation - ~1Gb 4/ with kong 2.8.3 - 2G mem limit / CPU&Mem HPA 2.38 error % - 46.7 rps - Memory utilisation - ~1Gb 5/ with kong 2.8.3 - restart with 4th case config - 1.79 error % - 45.1 rps - Memory utilisation - ~1Gb 1k VU for 180secs
General stability of 2.8.3 is better than 3.1.
kong 3.1 - continues to grow in it's memory utilisation and latency over a period of time - and requires a restart for stable behaviour unlike 2.8.x
Are you using any plugins that rely on the _batchqueue library? If so, see #10103 and the PRs related to it.
Are you using any plugins that rely on the _batchqueue library? If so, see #10103 and the PRs related to it.
Our custom plugins don't. But we use the following community plugins: https://docs.konghq.com/hub/kong-inc/ip-restriction/ https://docs.konghq.com/hub/kong-inc/response-transformer/ https://docs.konghq.com/hub/kong-inc/request-termination/ https://docs.konghq.com/hub/kong-inc/rate-limiting/ https://docs.konghq.com/hub/kong-inc/prometheus/ https://docs.konghq.com/hub/kong-inc/http-log/
^ any of these might be using this library you might be aware of?
_httplog uses it.
Thanks! Will profile without this plugin to confirm!
_httplog uses it.
Was able to reproduce and see better results without the http_log plugin with 3.1
And overall fixed with 3.1.1
Thanks!
And overall fixed with 3.1.1 !
Did you build version 3.1.1 yourself? I don't see a published release with that version number.
And overall fixed with 3.1.1 !
Did you build version 3.1.1 yourself? I don't see a published release with that version number.
I built my own with this published version from 7 days back
@shashanksapre I run the benchmark with the same config. First I used the IP as the upstream. Second I used the hostname as the upstream, but I don't see significant changes for RPS and latency.
So I recommend to make sure the DNS server is stable, and use the tool like wrk
to run and get the benchmark result.
@shashanksapre I run the benchmark with the same config. First I used the IP as the upstream. Second I used the hostname as the upstream, but I don't see significant changes for RPS and latency.
So I recommend to make sure the DNS server is stable, and use the tool like
wrk
to run and get the benchmark result.
Hi, the DNS server has been working fine. The whole setup is within a kubernetes (Amazon EKS) cluster which has latest patches. The only thing we changed is the kong image (upgraded from 2.8.x to 3.x)
Hi, the DNS server has been working fine. The whole setup is within a kubernetes (Amazon EKS) cluster which has latest patches. The only thing we changed is the kong image (upgraded from 2.8.x to 3.x)
A quick way to verify this is to replace the FQDN with an ip to check if performance is still experiencing high latency.
This will help us narrow down the scope of the problem.
Hi, the DNS server has been working fine. The whole setup is within a kubernetes (Amazon EKS) cluster which has latest patches. The only thing we changed is the kong image (upgraded from 2.8.x to 3.x)
A quick way to verify this is to replace the FQDN with an ip to check if performance is still experiencing high latency.
This will help us narrow down the scope of the problem.
I'm not sure if that's possible in an EKS cluster.
I'm not sure if that's possible in an EKS cluster.
just try.
I'm not sure if that's possible in an EKS cluster.
just try.
The reason we can't try is it's our production system.
I will try my best to reproduce your scenario and validate it again.
I deployed Kong using https://github.com/wjziv/kong-k8s-example/tree/main/basic-implementation, removed the rate-limiting plugin, and added the http-log plugin with the following configuration:
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
name: global-http-log
annotations:
kubernetes.io/ingress.class: kong
labels:
global: "true"
config:
http_endpoint: http://echo.default.svc.cluster.local:80
method: POST
timeout: 1000
keepalive: 1000
flush_timeout: 2
retry_count: 15
plugin: http-log
This allows http-log to access the echo service through FQDN.
I tested both Kong 2.8 and Kong 3.1, and found that their performance is similar. In terms of latency analysis, Kong 3.1 has slightly better latency than Kong 2.8.
I still hasn't been reproduced.
Hello. We upgraded to version 3.2.x and that has fixed our problem.
Is there an existing issue for this?
Kong version (
$ kong version
)3.1.1
Current Behavior
After upgrading to kong version 3.1.1 from 2.8.3, we are noticing a very high increase in kong latency.
Expected Behavior
Kong latency to be closer to the previous versions if not lower.
Steps To Reproduce
Anything else?
Following observations using http log plugin.
3.1.1
<!DOCTYPE html>
2.8.3
<!DOCTYPE html>