QubitProducts / exporter_exporter

A reverse proxy designed for Prometheus exporters
Apache License 2.0
336 stars 55 forks source link

Error responses not passed through #22

Closed candlerb closed 4 years ago

candlerb commented 4 years ago

When a backend gives an error status, it appears to confuse exporter_exporter.

Here is an example using snmp_exporter. If I talk to it directly with bad arguments, it returns a 400 status code and a useful message.

root@prometheus:~# curl 'localhost:9116/snmp?target=10.12.255.1&module=WRONG'
Unknown module 'WRONG'

root@prometheus:~# curl -v 'localhost:9116/snmp?target=10.12.255.1&module=WRONG'
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9116 (#0)
> GET /snmp?target=10.12.255.1&module=WRONG HTTP/1.1
> Host: localhost:9116
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 19 Oct 2019 09:11:42 GMT
< Content-Length: 23
<
Unknown module 'WRONG'
* Connection #0 to host localhost left intact

But using exporter_exporter:

modules:
  snmp:
    method: http
    http:
       port: 9116
       path: snmp

Here is what I get:

root@prometheus:~# curl 'localhost:9999/proxy?module=snmp&target=10.12.255.1&module=WRONG'
An error has occurred while serving metrics:

text format parsing error in line 1: expected float as value, got "module"

root@prometheus:~# curl -v 'localhost:9999/proxy?module=snmp&target=10.12.255.1&module=WRONG'
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9999 (#0)
> GET /proxy?module=snmp&target=10.12.255.1&module=WRONG HTTP/1.1
> Host: localhost:9999
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 19 Oct 2019 09:13:07 GMT
< Content-Length: 121
<
An error has occurred while serving metrics:

text format parsing error in line 1: expected float as value, got "module"
* Connection #0 to host localhost left intact

I looks like exporter_exporter has attempted to parse the error message as a metric, and this in turn has caused a 500 internal server error.

I am not actually sure why exporter_exporter needs to parse metrics - why not pass the body through unchanged?

But in any case, when the backend returns a non-2xx code then I think that the backend response code and body should be passed through unchanged.

candlerb commented 4 years ago

Hmm, maybe this is a bad example. The fact that both exporter_exporter and snmp_exporter use a module query-string argument is confusing.

EDIT: raised separately as #23

candlerb commented 4 years ago

The point about error responses still stands. You can demonstrate it by missing a mandatory query string arg on snmp_exporter:

# curl -v localhost:9999/proxy?module=snmp
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9999 (#0)
> GET /proxy?module=snmp HTTP/1.1
> Host: localhost:9999
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 19 Oct 2019 09:41:44 GMT
< Content-Length: 103
<
An error has occurred while serving metrics:

text format parsing error in line 1: invalid metric name
* Connection #0 to host localhost left intact

In this case, tcpdump shows that the response from snmp_exporter was:

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Sat, 19 Oct 2019 09:43:05 GMT
Content-Length: 37

'target' parameter must be specified
tcolgate commented 4 years ago

Thanks for the report. We should be able to handle this better.

tcolgate commented 4 years ago

I've added support for reporting an error. Unfortunately, some of the internals mean that we cannot simply reforwarded all errors. As it stands, we should now at least report there is an error and not just report a parsing failure. Hopefully this helps a bit.

candlerb commented 4 years ago

Great, thanks for that.

candlerb commented 4 years ago

Aside: since at that point resp.Body has not been touched, presumably it would be possible to read (say) the first 1024 bytes and include it in the error message?

tcolgate commented 4 years ago

The problem is that it's promhttp.handler that calls the gatherer, so it's that that passes back the response. I could put a chunk of the Body in the error I give back to the promhttp handler, but that would be even uglier output than what we currently do.