edgi-govdata-archiving / web-monitoring-diff

Tools for diffing and comparing web content. Also includes a web server that makes diffs available as an HTTP service.
https://web-monitoring-diff.readthedocs.io/
GNU General Public License v3.0
10 stars 3 forks source link

Diff server should return a 503 status for upstream 503 responses #20

Open Mr0grog opened 4 years ago

Mr0grog commented 4 years ago

The diff server usually handles upstream errors (I.e. the result of fetching the URL for either the a or b parameters is an error) by returning a 502 (bad gateway) status. However, if the upstream response was a 503 (service unavailable) status, we should probably return the same, including the upstream Retry-After response header, if present.

503 is often used to tell the requesting server to slow down, so it would be helpful to send that info back to the downstream server, since are just acting dumb in the middle.

In fact, if an upstream response comes back as any error with a Retry-After header, we should probably send along that header to the downstream client.

(We sometimes get this status from S3 when running imports and auto analysis, so it would be nice to handle it better.)

Mr0grog commented 4 years ago

We should also handle 429 (too many requests) statuses this way, too. They also frequently (but not always) have a Retry-After header.

Mr0grog commented 4 years ago

The relevant place in the code for this is here, in fetch_diffable_content() in web_monitoring/diff_server/server.py: https://github.com/edgi-govdata-archiving/web-monitoring-processing/blob/5220721ff3b414f653cb66510d150c1bde2b9202/web_monitoring/diff_server/server.py#L290-L306