coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

Zincati / Cincinnati server errors: failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error) #1661

Closed travier closed 5 months ago

travier commented 5 months ago

Since January 24th, 2024 on my server:

Jan 24 20:23:50 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 03:49:23 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 12:35:40 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 13:38:34 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 13:43:55 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 13:49:31 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 13:54:57 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 14:00:05 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 15:56:29 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 16:07:31 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 16:12:50 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 16:29:18 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 16:45:44 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 16:50:56 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 17:01:38 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
Jan 25 17:07:05 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
Jan 25 17:12:21 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 19:02:50 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 19:08:21 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 502: (unknown/generic server error)
Jan 25 19:13:35 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 408: (unknown/generic server error)
Jan 25 19:18:57 fcos.siosm.fr zincati[2435]: [ERROR zincati::cincinnati] failed to check Cincinnati for updates: server-side error, code 500: (unknown/generic server error)
travier commented 5 months ago

From the logs:

[2024-01-25T10:38:28Z INFO fcos_graph_builder] starting server (fcos-graph-builder 0.1.0)
[2024-01-27T07:30:00Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/prod/streams/testing/releases.json): error trying to connect: Connection reset by peer (os error 104)
[2024-01-27T07:30:00Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/updates/testing.json): error trying to connect: Connection reset by peer (os error 104)
[2024-01-27T13:58:29Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/updates/next.json): connection error: Connection reset by peer (os error 104)
[2024-01-28T06:17:46Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/updates/stable.json): error trying to connect: Connection reset by peer (os error 104)
[2024-01-30T06:39:28Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/updates/next.json): error trying to connect: Connection reset by peer (os error 104)
[2024-01-30T06:39:28Z ERROR fcos_graph_builder::scraper] transient scraping failure: error sending request for url (https://builds.coreos.fedoraproject.org/prod/streams/next/releases.json): error trying to connect: Connection reset by peer (os error 104)
travier commented 5 months ago

Made https://github.com/coreos/fedora-coreos-cincinnati/pull/95 as an option to push an update.

jlebon commented 5 months ago

If you look at the timestamps in the Cincinnati logs, they're few and far between and represent transient failures.

The Zincati logs appear more frequent but stopped after January 25th (which is also near when Cincinnati was last restarted, I think from when the cluster was updated). So presumably it succeeded in getting the graph afterwards (would need to increase verbosity to be able to tell I think). Is your node updated to the latest release?

Manually curling the server seems to work fine right now at least:

$ curl -L 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=x86_64&stream=testing&rollout_wariness=0'
{
  "nodes": [
    {
      "version": "30.20190716.1",
      "metadata": {
        "org.fedoraproject.coreos.releases.age_index": "0",
        "org.fedoraproject.coreos.updates.deadend": "true",
        "org.fedoraproject.coreos.scheme": "checksum",
        "org.fedoraproject.coreos.updates.deadend_reason": "https://github.com/coreos/fedora-coreos-tracker/issues/215"
      },
      "payload": "ff4803b069b5a10e5bee2f6bb0027117637559d813c2016e27d57b309dd09d6f"
    },
...
travier commented 5 months ago

I'm on the last release and restarting Zincati / status tells everything is OK.

Looks like a false alarm. Sorry for the noise.