[Scale out POC] secret not found in kubelet [system tenant only]

Sindica commented 3 years ago

What happened: In local scale out test, perf test pods sometimes can only being created in one RP: Failed RP kubelet log:

I0325 22:11:01.116552    3687 reflector.go:218] Starting reflector *v1.Secret (0s) from object-"system"/"ftblf3-testns"/"default-token-r8stb"
I0325 22:11:01.116586    3687 reflector.go:293] ListAndWatch *v1.Secret. filter bounds []. name object-"system"/"ftblf3-testns"/"default-token-r8stb". Watch page size 0. resync period 0s
E0325 22:11:01.384617    3687 secret.go:199] Couldn't get secret system/ftblf3-testns/default-token-r8stb: secret "default-token-r8stb" not found

Successful RP kubelet log:

I0325 22:11:01.112580    3164 reflector.go:218] Starting reflector *v1.Secret (0s) from object-"system"/"ftblf3-testns"/"default-token-r8stb"
I0325 22:11:01.112615    3164 reflector.go:293] ListAndWatch *v1.Secret. filter bounds []. name object-"system"/"ftblf3-testns"/"default-token-r8stb". Watch page size 0. resync period 0s
I0325 22:11:01.355145    3164 secret.go:212] Received secret system/ftblf3-testns/default-token-r8stb containing (3) pieces of data, 2222 total bytes

What you expected to happen: Both RPs should be able to get secret and create running pods.

How to reproduce it (as minimally and precisely as possible): ./hack/arktos-up-scale-out-poc.sh

Anything else we need to know?: The following file might need additional code changes: . pkg/controller/volume/scheduling/scheduler_binder.go:477 . pkg/kubelet/kubelet.go . pkg/scheduler/factory/factory.go (Refactor AggregateNodeLister) . pkg/scheduler/nodeinfo/util.go (// TODO - set rpId in NodeInfo) . pkg/scheduler/scheduler.go (sched.config.NodeListers[0])

Environment:

Arktos version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:
Branch: https://github.com/futurewei-cloud/arktos-perftest/tree/merge-scale-out-poc-2021-0430

Sindica commented 3 years ago

This is for system tenant. I will test in tenant arktos and zeta and see whether it happens again.

Sindica commented 3 years ago

I ran load test locally with tenant arktos and zeta, Both were able to schedule pods onto both RPs. Suspect this is a system tenant only issue. As we don't have a concrete plan to deal with system tenant objects, lowering the priority.

CentaurusInfra / arktos

[Scale out POC] secret not found in kubelet [system tenant only] #1052