AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.36k stars 303 forks source link

fix: gpushare concurrent map read write #197

Open swartz-k opened 1 year ago

swartz-k commented 1 year ago

fix error as below

fatal error: concurrent map read and map write
{"level":"info","ts":"2023-01-11T09:58:45.234Z","caller":"cache/nodeinfo.go:340","msg":"info: try to find unhealthy node unhealthy-gpu-xxxxx"}
{"level":"info","ts":"2023-01-11T09:58:45.234Z","caller":"cache/nodeinfo.go:306","msg":"info: available GPU list map[0:23 1:23 2:11 3:5 4:23 5:23 6:23 7:23] before removing unhealty GPUs"}
{"level":"info","ts":"2023-01-11T09:58:45.234Z","caller":"cache/nodeinfo.go:311","msg":"info: available GPU list map[0:23 1:23 2:11 3:5 4:23 5:23 6:23 7:23] after removing unhealty GPUs"}
{"level":"info","ts":"2023-01-11T09:58:45.234Z","caller":"cache/nodeinfo.go:156","msg":"debug: AvailableGPUs: map[0:23 1:23 2:11 3:5 4:23 5:23 6:23 7:23] in node xxxx"}

goroutine 109 [running]:
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare.(*Controller).syncPod(0xc0000c6000, {0xc01fef3fb0, 0x30})
    /Users/joker/go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare/controller.go:200 +0x40d
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare.(*Controller).processNextWorkItem(0xc0000c6000)
    /Users/joker/go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare/controller.go:232 +0x16d
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare.(*Controller).runWorker(0xc0004ac6a0?)
    /Users/joker/go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare/controller.go:182 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
    /Users/joker/go/pkg/mod/k8s.io/apimachinery@v0.25.4/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x1a348e0, 0xc01a328900}, 0x1, 0xc0001150e0)
    /Users/joker/go/pkg/mod/k8s.io/apimachinery@v0.25.4/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
    /Users/joker/go/pkg/mod/k8s.io/apimachinery@v0.25.4/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
    /Users/joker/go/pkg/mod/k8s.io/apimachinery@v0.25.4/pkg/util/wait/wait.go:92 +0x25
created by github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare.(*Controller).Run
    /Users/joker/go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/gpushare/controller.go:168 +0x167