INT and DEV cluster | node(s) exceed max volume count

evegufy commented 2 weeks ago

Is your support request related to a problem? Please describe.

The following error is observed on the INT and Cluster, AFAIK since a couple of hours: 0/8 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had volume node affinity conflict, 6 Insufficient cpu. preemption: 0/8 nodes are available: 1 Preemption is not helpful for scheduling, 7 No preemption victims found for incoming pod..

Affected applications on INT which I'm aware of: BDRS, centralidp (v2), bpndiscovery service

Describe the solution you'd like

Please adjust resources

carslen commented 2 weeks ago

Hey Evelyn, we've increased the recourses on INT and the pods has been scheduled successfully, although the cluster should have had enough resources to schedule new recources. Maybe it's because of 81 Postgres pods on INT uses the same podAntiAffinity setup (except of app.kubernetes.io/instance)

On DEV we couldn't find any progressing pods, does the error still persist on DEV?

evegufy commented 2 weeks ago

Hey Evelyn, we've increased the recourses on INT and the pods has been scheduled successfully, although the cluster should have had enough resources to schedule new recources. Maybe it's because of 81 Postgres pods on INT uses the same podAntiAffinity setup (except of app.kubernetes.io/instance)

On DEV we couldn't find any progressing pods, does the error still persist on DEV?

@carslen Thank you! INT seems healthy again.

I might have judged too early that exactly the same error caused the issue also on DEV. I just checked again, and on DEV some applications went into an CrashLookBackOffConfigError due to suddenly missing secrets:

I manually synced the the secrets just now, and the apps are up again.

evegufy commented 2 weeks ago

issue resolved

eclipse-tractusx / sig-infra

INT and DEV cluster | node(s) exceed max volume count #506