Add additional metrics - Githubissues

SuperQ commented 4 years ago

Description

Add additional metrics:

Response counter with code/method labels.
Content length exceeded counter.
Failed proxy responses.
Truncated proxy responses.
Build + version info.

Example output:

# HELP camo_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which camo was built.
# TYPE camo_build_info gauge
camo_build_info{branch="more_metrics",goversion="go1.13",revision="17c52efa9e174dbcefe8a3403aa915d9fb8ca16d",version="2.0.1-18-g17c52ef"} 1
# HELP camo_proxy_content_length_exceeded_total The number of requests where the content length was exceeded.
# TYPE camo_proxy_content_length_exceeded_total counter
camo_proxy_content_length_exceeded_total 0
# HELP camo_proxy_reponses_failed_total The number of responses that failed to send to the client.
# TYPE camo_proxy_reponses_failed_total counter
camo_proxy_reponses_failed_total 0
# HELP camo_proxy_reponses_truncated_total The number of responess that were too large to send.
# TYPE camo_proxy_reponses_truncated_total counter
camo_proxy_reponses_truncated_total 0
# HELP camo_responses_total Total HTTP requests processed by the go-camo, excluding scrapes.
# TYPE camo_responses_total counter
camo_responses_total{code="404",method="get"} 1

Checklist

[x] Code compiles correctly
[ ] Created tests (if appropriate)
[x] All tests passing
[ ] Extended the README / documentation (if necessary)

SuperQ commented 4 years ago

There are a lot of return points in the proxy ServeHTTP, I wasn't exactly sure which should have additional instrumentation.

dropwhile commented 4 years ago

There are a lot of return points in the proxy ServeHTTP

Yeah, there are many ways http requests can fail.

dropwhile commented 4 years ago

hmm. I'll have to think about this one. I could see this adding additional overhead to requests (how significant?), even when the prometheus endpoint isn't desired (eg. not enabled via cli flag).

SuperQ commented 4 years ago

Prometheus counter increments are about 12 nanoseconds. So, the overhead is negligible.

SuperQ commented 4 years ago

Also, we're only using the counter for failures, which should keep the overhead to a minimum. It's generally less expensive to use Prometheus counters than it is to use logs. Which is part of why it's useful. We can get context of debug-level info, without having to log things.

dropwhile commented 4 years ago

@SuperQ I merged this in, then made a few changes.

I felt including build and branch information by default might be a bit much for some people, even though I do agree with you that if people don't want that exposed they should filter those requests out upstream.
I did add back in some support for adding that information back in at run-time (via env vars). A devops pipeline (or a docker container) workflow may be able to add those env vars based on builds, and still expose them in a useful way so folks can track partial cluster deploys and rollouts. If this isn't sufficient let me know (open a new ticket/issue) -- I am certainly open to further discussion.

SuperQ commented 4 years ago

Adding things back in via env vars is a contrived way to deal with this. The whole point of including the information in the binary is that it allows users to identify that binary, not by the env around it.

Prometheus users expect build information metrics, it's a standard thing to have.

Hiding it doesn't really do anything for security. An attacker isn't going to wait on checking the version info to try an exploit, they're just going to try anyway.

If users cares so much that they don't want the info exposed, they can build it out by setting the make variables. But the default info is expected by monitoring systems.

cactus / go-camo

Add additional metrics #41

Description

Checklist