BCDevOps / OpenShift4-RollOut

This is the primary board for all activities related to the roll out of OpenShift 4
Apache License 2.0
0 stars 2 forks source link

PRB0040484 AlertManager reporting hourly specific KubeAPIErrorsHigh failures #480

Closed wmhutchison closed 3 years ago

wmhutchison commented 3 years ago

Describe the issue Once an hour both KPROD and SILVER clusters are showing specific LIST KubeAPIErrorsHigh errors for podmonitors, prometheusrules and servicemonitors. Appears to have started not long after we upgraded to OCP 4.5.20.

Which Sprint Goal is this issue related to?

Additional context Red Hat case: https://access.redhat.com/support/cases/#/case/02829539

Definition of done Checklist (where applicable)

wmhutchison commented 3 years ago

preliminary web searches suggest that https://bugzilla.redhat.com/show_bug.cgi?id=18918 might apply here, which means final fix is an upgrade to OCP 4.5.22.

Opened a Red Hat case though and provided the must-gather for KLAB both to get a second opinion and also ensure nothing else is amiss.

wmhutchison commented 3 years ago

Killing Prometheus operator works as a work-around. Moving to Blocked since we need an OCP upgrade to permanently fix.

StevenBarre commented 3 years ago

Should be fixed in v4.5.22 and will be included in this quarters patching of Silver

StevenBarre commented 3 years ago

Silver upgraded to v4.5.31