所以对master分支的代码进行修改,将etcd镜像拉取策略由Always改为IfNotPresent,重新build:
go version: 1.13.6 linux/amd64
go build -o paddle-on-k8s-operator ./cmd/operator
镜像Dockerfile和项目提供的Dockerfile保持一致:
FROM ubuntu:18.04
ADD paddle-on-k8s-operator /usr/local/bin
ENTRYPOINT ["/usr/local/bin/paddle-on-k8s-operator"]
部署operator后,错误日志如下:
E0413 10:36:30.127845 1 reflector.go:205] pkg/mod/k8s.io/client-go@5.0.1+incompatible/tools/cache/reflector.go:99: Failed to list *v1alpha1.TrainingJob: the server could not find the requested resource (get trainingjobs.paddlepaddle.org)
看了下项目源码,问题应该出现在一个异步的goroutine上:
./cmd/operator/paddle_operator.go
Line 81: go paddleInformer.Start(stopCh)
项目目前提供的老版本镜像 https://hub.docker.com/r/tizhou86/paddle-on-k8s-operator 里强制限制了 master pod里etcd 容器的镜像(镜像为m3ngyang/etcd:v3.2.1)拉取策略为Always,在无互联网的集群中operator无法完全跑起来。
所以对master分支的代码进行修改,将etcd镜像拉取策略由Always改为IfNotPresent,重新build: go version: 1.13.6 linux/amd64 go build -o paddle-on-k8s-operator ./cmd/operator
镜像Dockerfile和项目提供的Dockerfile保持一致: FROM ubuntu:18.04 ADD paddle-on-k8s-operator /usr/local/bin ENTRYPOINT ["/usr/local/bin/paddle-on-k8s-operator"]
部署operator后,错误日志如下: E0413 10:36:30.127845 1 reflector.go:205] pkg/mod/k8s.io/client-go@5.0.1+incompatible/tools/cache/reflector.go:99: Failed to list *v1alpha1.TrainingJob: the server could not find the requested resource (get trainingjobs.paddlepaddle.org)
看了下项目源码,问题应该出现在一个异步的goroutine上: ./cmd/operator/paddle_operator.go Line 81: go paddleInformer.Start(stopCh)