envoyproxy / nighthawk

L7 (HTTP/HTTPS/HTTP2/HTTP3) performance characterization tool
Apache License 2.0
361 stars 81 forks source link

add dedicated counters for a few common 4xx and 5xx codes #845

Open eric846 opened 2 years ago

eric846 commented 2 years ago

Can we afford to add 5 or 15 counters to help troubleshoot these specific HTTP outcomes? These would be in addition to today's catch-all http_4xx and http_5xx counters.

Usually if I saw 4xx or 5xx errors in Nighthawk counters, I would just use curl against the server directly to see what's happening, but when using a custom transport socket, that's impossible.

If the resource cost is significant, we should prioritize the most common counters.

If we can afford 15:

If we can only afford 5:

mum4k commented 2 years ago

We can certainly add more counters. We can limit the impact of this addition by hiding these changes behind a feature flag, so that we don't change the default behavior.

We could run some larger load tests to determine the impact and feasibility of this which could help use decide whether we add 5 or 15.

@eric846 is this something you are planning to work on?

eric846 commented 2 years ago

All sounds good. (I'm not planning to work on it myself.)

eric846 commented 2 years ago

I just realized a way to reduce the effort.

We can just let the user specify a list of HTTP codes they want to break out as separate counters. Then we aren't even bound by the set of 15. The default would be an empty list, and I would probably start off with 404,500,502,503,504 myself. For debugging where performance doesn't matter, someone could try $(seq -s , 200 599).