fastly / fastly-exporter

A Prometheus exporter for the Fastly Real-time Analytics API
Apache License 2.0
99 stars 36 forks source link

Metric LastSuccessfulResponse is updated even if API returns error #161

Open findmyname666 opened 7 months ago

findmyname666 commented 7 months ago

The Fastly exporter updates the metric lastSuccessfulResponse even if the Fastly API returns an error. In our case, I noticed that the Fastly token was revoked from the log messages:

level=error component=rt.fastly.com service_id=_reducted_ status_code=403 response_ts=1712674340 err="invalid authentication" msg="token may be invalid"

The metric description suggests that the metric shouldn't be updated:

"last_successful_response", Help: "Unix timestamp of the last successful response received from the real-time stats API."

I tracked it to a specific branch of the code related to retrieving origin metrics as an example:

Related questions:

If necessary, I can take a look at the code adjustments once we agree on the next steps.

leklund commented 7 months ago

I believe the intent of the lastSuccessfulResponse was to indicate that the http request to rt.fastly.com succeeded, regardless of the response status code. If, however, this was the intended behavior I'm not sure I see a use case where you'd want the metric to be set from a 403.

I see two options:

  1. add a label to the metric for the status code. This could retain the existing metric but allow clients to filter where status=200.
  2. only Set LastSuccessfulResponse when the response status code == 200.
findmyname666 commented 6 months ago

I believe the intent of the lastSuccessfulResponse was to indicate that the http request to rt.fastly.com succeeded, regardless of the response status code. If, however, this was the intended behavior I'm not sure I see a use case where you'd want the metric to be set from a 403.

I see two options:

  1. add a label to the metric for the status code. This could retain the existing metric but allow clients to filter where status=200.
  2. only Set LastSuccessfulResponse when the response status code == 200.

I agree with your proposal. Any of those would be helpful. However if we go with the first, there is still the question if the metrics lastSuccessfulResponse should report differently for 4xx and 5xx. From my point of view 4xx and 5xx aren't successful responses for a client.