cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Bug: Lots of containerd error messages on the kubernetes nodes #3504

Open vevetron opened 1 month ago

vevetron commented 1 month ago

Describe the bug Lots of containerd error messages on the kubernetes nodes

I don't know if this actually impacts anything but our sanity.

time="2024-10-16T23:12:24.724957756Z" level=error msg="invalid metric type for a21c5908d2c90c6779f4ee19d1e565a549fe98ddae0be362d89a95d1d11cd082" error="<nil>"

To Reproduce https://console.cloud.google.com/logs/query;query=resource.type%3D%22k8s_node%22%0Aresource.labels.project_id%3D%22cal-itp-data-infra%22%0Aresource.labels.location%3D%22us-west1%22%0Aresource.labels.cluster_name%3D%22data-infra-apps%22%0Aresource.labels.node_name%3D%22gke-data-infra-apps-jupyterhub-users-6aa76dbb-7w1f%22%0A--%20severity%3E%3DDEFAULT%0A--Hide%20similar%20entries%0A--%20-%2528-jsonPayload.message:*%2529%0A--End%20of%20hide%20similar%20entries%0AjsonPayload._SYSTEMD_UNIT%3D%22containerd.service%22;storageScope=project;cursorTimestamp=2024-10-16T23:12:24.728217Z;startTime=2024-10-16T23:11:00Z;endTime=2024-10-16T23:14:00Z?project=cal-itp-data-infra

Do a query similar to this in google cloud logs: resource.type="k8s_node" resource.labels.project_id="cal-itp-data-infra" resource.labels.location="us-west1" resource.labels.cluster_name="data-infra-apps" resource.labels.node_name="gke-data-infra-apps-jupyterhub-users-6aa76dbb-7w1f" -- severity>=DEFAULT --Hide similar entries -- -(-jsonPayload.message:*) --End of hide similar entries jsonPayload._SYSTEMD_UNIT="containerd.service"

An you will see lots of messages like this

{
  "insertId": "owkf791zijrlzkuy",
  "jsonPayload": {
    "_COMM": "containerd",
    "_MACHINE_ID": "46193cd53372bdd9238869f8774a3f35",
    "_CMDLINE": "/usr/bin/containerd",
    "_STREAM_ID": "d9718a8d7e2f40efb23870778de3f4d5",
    "SYSLOG_FACILITY": "3",
    "_SYSTEMD_INVOCATION_ID": "803e3ec6fef84e4c9f7e79da1953113d",
    "_RUNTIME_SCOPE": "system",
    "_SYSTEMD_CGROUP": "/system.slice/containerd.service",
    "_SYSTEMD_UNIT": "containerd.service",
    "_BOOT_ID": "a250cd9aeaa142f38fe26b618c620262",
    "_PID": "1654",
    "_UID": "0",
    "MESSAGE": "time=\"2024-10-16T23:12:24.724957756Z\" level=error msg=\"invalid metric type for a21c5908d2c90c6779f4ee19d1e565a549fe98ddae0be362d89a95d1d11cd082\" error=\"<nil>\"",
    "_GID": "0",
    "_HOSTNAME": "gke-data-infra-apps-jupyterhub-users-6aa76dbb-7w1f",
    "SYSLOG_IDENTIFIER": "containerd",
    "PRIORITY": "6",
    "_EXE": "/usr/bin/containerd",
    "_SYSTEMD_SLICE": "system.slice",
    "_TRANSPORT": "stdout",
    "_CAP_EFFECTIVE": "1ffffffffff"
  },
  "resource": {
    "type": "k8s_node",
    "labels": {
      "project_id": "cal-itp-data-infra",
      "location": "us-west1",
      "node_name": "gke-data-infra-apps-jupyterhub-users-6aa76dbb-7w1f",
      "cluster_name": "data-infra-apps"
    }
  },
  "timestamp": "2024-10-16T23:12:24.727831Z",
  "logName": "projects/cal-itp-data-infra/logs/container-runtime",
  "receiveTimestamp": "2024-10-16T23:12:28.288023312Z"
}

Expected behavior Less errors in life. Less logs.

Additional context Please enter any other context about the problem here.