AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.39k stars 308 forks source link

gpushare-scheduler-extender http server panic #169

Closed fullpolarfox closed 2 years ago

fullpolarfox commented 2 years ago

I've try to install it, all pod and ds starting ok. But when i try to deploy pod that must schedule with extender i ve got this in extender log:

server.go:2926: http: panic serving 192.168.144.98:48082: runtime error: invalid memory address or nil pointer dereference
goroutine 89 [running]:
net/http.(*conn).serve.func1(0xc421e74640)
        /usr/local/go/src/net/http/server.go:1726 +0xd0
panic(0x1098dc0, 0x1af3170)
        /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/scheduler.Predicate.Handler(0x12060a2, 0x10, 0x127b7e8, 0xc420340c40, 0xc4256bc000, 0xc4254e9b90, 0x0, 0x0)
        /go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/scheduler/predicate.go:17 +0x37
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/routes.PredicateRoute.func1(0x130e8e0, 0xc4209daee0, 0xc424fc9300, 0x0, 0x0, 0x0)
        /go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/routes/routes.go:82 +0x7b0
github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/routes.DebugLogging.func1(0x130e8e0, 0xc4209daee0, 0xc424fc9300, 0x0, 0x0, 0x0)
        /go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/pkg/routes/routes.go:161 +0x197
github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc42201f440, 0x130e8e0, 0xc4209daee0, 0xc424fc9300)
        /go/src/github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/github.com/julienschmidt/httprouter/router.go:334 +0x79c
net/http.serverHandler.ServeHTTP(0xc42502b520, 0x130e8e0, 0xc4209daee0, 0xc424fc9300)
        /usr/local/go/src/net/http/server.go:2697 +0xbc
net/http.(*conn).serve(0xc421e74640, 0x130f7a0, 0xc42229ae40)
        /usr/local/go/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2798 +0x27b

and in pod

Post "http://127.0.0.1:32766/gpushare-scheduler/filter": EOF
Hatuw commented 2 years ago

same error with v1.23.4

fullpolarfox commented 2 years ago

found problem. nodeCacheCapable: false instead nodeCacheCapable: true in my policy config file