Open michaelaguiar opened 3 years ago
I haven't seen this anywhere else. The code is also fairly simple around checking for SSL certs so I would imagine it's working as expected as well. When you say "every request after" do you mean when there is a redirect?
Also, what's the value of "tls_handshake"
in the response?
I believe I found the issue. When the web server starts and makes the first request, it takes a bit extra time for the tls_handshake because it needs to parse the certpool.
For example. I pass it a url "google.com", and the tls_handshake will be 350ms for example on the very first request of the web server. I request again and its 50ms. Also, it doesn't matter if I request google.com on the first request, and a completely different domain every request after, only the very first has that extra time.
Do you see this when you run it? I have a solution implemented on my end that I can create a PR for if you do.
Thank you. I can't see the same on the TLS but DNS gets faster which is expected (see below).
{"status":200,"error":"","dns_lookup":1835826,"tcp_connection":23338192,"tls_handshake":0,"server_processing":56071368,"content_transfer":618086,"name_lookup":1835826,"connect":25142277,"pre_transfer":0,"start_transfer":81245386,"total":81863472}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"http://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":533888,"tcp_connection":22760430,"tls_handshake":0,"server_processing":58920397,"content_transfer":737248,"name_lookup":533888,"connect":23257384,"pre_transfer":0,"start_transfer":82214715,"total":82951963}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"http://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":535071,"tcp_connection":21908510,"tls_handshake":0,"server_processing":66965847,"content_transfer":1234778,"name_lookup":535071,"connect":22422359,"pre_transfer":0,"start_transfer":89409428,"total":90644206}
Run the same test using https on the domains. This issue is only for SSL for the initial parsing of the certs. If you start the web server, and hit https://google.com, then hit it again, you will see a difference.
Let me know if you see what I mean.
Here are the responses for https://www.google.com
and https://google.com
:
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":17917452,"tcp_connection":17443404,"tls_handshake":163105938,"server_processing":66707585,"content_transfer":1722249,"name_lookup":17917452,"connect":35360856,"pre_transfer":198791702,"start_transfer":265499287,"total":267221536}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":620721,"tcp_connection":22726986,"tls_handshake":25666911,"server_processing":64927525,"content_transfer":931744,"name_lookup":620721,"connect":23347707,"pre_transfer":49082216,"start_transfer":114009741,"total":114941485}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":537637,"tcp_connection":11668316,"tls_handshake":28164512,"server_processing":57002180,"content_transfer":1385776,"name_lookup":537637,"connect":12205953,"pre_transfer":40523704,"start_transfer":97525884,"total":98911660}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://www.google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":594996,"tcp_connection":21546288,"tls_handshake":29708852,"server_processing":68509031,"content_transfer":2164011,"name_lookup":594996,"connect":22141284,"pre_transfer":51930152,"start_transfer":120439183,"total":122603194}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":16124478,"tcp_connection":15921820,"tls_handshake":22315163,"server_processing":63275734,"content_transfer":4474806,"name_lookup":16124478,"connect":32046298,"pre_transfer":54467544,"start_transfer":117743278,"total":122218084}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":911491,"tcp_connection":19486960,"tls_handshake":26556214,"server_processing":67150426,"content_transfer":7797962,"name_lookup":911491,"connect":20398451,"pre_transfer":47083548,"start_transfer":114233974,"total":122031936}
❯ curl -H "Content-Type: application/json" http://localhost:8080 --data '{"url":"https://google.com","timeout":"1s"}'
{"status":200,"error":"","dns_lookup":1273497,"tcp_connection":17381766,"tls_handshake":29717339,"server_processing":57314916,"content_transfer":6913813,"name_lookup":1273497,"connect":18655263,"pre_transfer":48479534,"start_transfer":105794450,"total":112708263}
This both cases, TLS Handshake is longer on the first request and so is the DNS lookup time. I guess I'm not understanding the underlying issue. As far as I understood, you think only the first request is verified for correct TLS certs, because it takes longer. However they are all checked for handshake but it takes less time after they cert is parsed on the server. What am I missing here in terms of the original issue?
So dns_lookup isn't a problem. The first time it's a little more, but it jumps around, and is similar the 5th call.
Now, the "tls_handshake" goes from ~163ms the first time, down to about 25ms every time after. I could be wrong, but I believe this extra time is the server parsing the certpool on the first request.
If we have a web service for returning the total time through the lifecycle, I would expect we would want the times from the 2nd+ calls, and not account for the web server parsing the certs the first time.
An example:
rootCAs, _ := x509.SystemCertPool()
if rootCAs == nil {
rootCAs = x509.NewCertPool()
}
Adding this to the top of the main func will prep the root certs on web server start, and not on the first request. After adding this, you will see every request, including the first is much closer on time, instead of being so much longer the first request.
-- Quick Note It doesn't matter on the domains you hit, only that the first request parses the root certs. hitting a completely different url on request 2 will not have this extra time.
Please let me know if I misunderstand the process a bit. I'm new to golang and am just experimenting.
Also, a tool like httpstat will return this extra latency on every request, but when used through the web server, only the first request has the extra.
I would assume we don't want to account for that extra time?
I understand your point now. I think this behavior depends on what the service is used for. If it is going to be used as a service to provide consistent reading of a website latency and its chain, then I'd agree reseting the cache or warming them up (if possible) is a good option. However if it is to be used to check website availability and the metrics provided are there to identify where the issues are (like a tracer) then adding cache reset is not the best option as it severely impacts its performance in multi-tenanted environments.
One can imagine cache reset being optional as a param during startup to cater for both cases.
Sorry if I misunderstand, and thanks for reviewing this with me. I don't see how the extra latency on the first request can be linked to the actual site at all?
For example. Web server starts, I request https://google.com to get the numbers. I get total numbers that are about 200 - 300 ms larger since its the first request. Then I check https://microsoft.com. Site is up but the total time I get is 200-300 ms LESS than what I would get, if Microsoft was the first site I checked when the web service started.
Likewise, if first request is Microsoft, second is google, then Microsoft is 200 - 300ms longer than it would be if it were the 2+ request after web server starts.
Based on this, I feel like the extra latency is just the web server prepping for the request, and not caused by the site your checking.
By cache, are you referring to the cert pool above that I posted? My intention is to have this in main() before the web server starts instead of using time from the first request to prep.
Like I said, I am super new at golang, so if I'm missing something just let me know! Appreciate it.
I changed to code to allow system cert to be added at the start (see features/tls_handshake
branch). However I can't see any difference in timing for the first and consequent calls. Furthermore, no noticeable difference before and after the change. As a matter of fact, I am not seeing the huge differences in TLS handshake durations you are observing.
I noticed when InsecureSkipVerify is set to false, only the first request is verified and takes extra time for the TLSHandshake. Every request after is the same regardless of InsecureSkipVerify value.
Are you having the same issue?