Open sumchak1 opened 3 years ago
It like this issue:
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/390
you can have an try
We could solve the issue by creating a SAN certificate for the webhook. But after creating the custom resource for FlinkCluster
, it didn't create pod, svc etc. Even, there is no event showing for this custom resource. Please check below:
$ kubectl apply -f flinkoperator_v1beta1_flinksessioncluster.yaml
flinkcluster.flinkoperator.k8s.io/flinksessioncluster-sample created
[sumit@sumit flink]$
[sumit@sumit flink]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ddpstreamappdevtest-b5684688b-wz55p 0/1 ImagePullBackOff 0 12d
default-sparkoperator-667fff9765-g6trh 1/1 Running 0 12d
hive-1578926540-metastore-6dd4f78f9b-hkgns 1/1 Running 0 12d
hive-1578926540-server2-68d8685996-p5lf7 1/1 Running 0 12d
influxdbd2b0d-58bcdd89fb-xmv5m 1/1 Running 0 12d
ingress-checker-1626517800-jzn4l 0/1 Completed 0 2d4h
ingress-checker-1626604200-476v6 0/1 Completed 0 28h
ingress-checker-1626690600-mfdzt 0/1 Completed 0 4h27m
logdna-agent-4zbnv 1/1 Running 2 153d
logdna-agent-65r77 1/1 Running 14 153d
logdna-agent-6ld8l 1/1 Running 3 153d
logdna-agent-8g2ch 1/1 Running 10 153d
logdna-agent-n22dh 1/1 Running 6 153d
logdna-agent-wdj54 1/1 Running 5 153d
overprovisioning-6d695dd44c-bvlk6 0/1 Pending 0 9d
overprovisioning-6d695dd44c-cz5j7 1/1 Running 0 12d
overprovisioning-6d695dd44c-j2dt6 1/1 Running 0 5d4h
overprovisioning-6d695dd44c-nxjv5 1/1 Running 0 12d
overprovisioning-6d695dd44c-p5jf9 1/1 Running 0 6d21h
overprovisioning-6d695dd44c-qhsgq 0/1 Pending 0 5d4h
overprovisioning-autoscaler-587ff88c66-5wcd7 1/1 Running 0 12d
privingressapp-8595c5f87d-xjv9c 1/1 Running 0 12d
[sumit@sumit flink]$ kubectl get FlinkCluster
NAME AGE
flinksessioncluster-sample 32s
[sumit@sumit flink]$ kubectl describe FlinkCluster flinksessioncluster-sample
Name: flinksessioncluster-sample
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version: flinkoperator.k8s.io/v1beta1
Kind: FlinkCluster
Metadata:
Creation Timestamp: 2021-07-19T14:57:00Z
Generation: 1
Managed Fields:
API Version: flinkoperator.k8s.io/v1beta1
Fields Type: FieldsV1
Fields V 1:
F : Metadata:
F : Annotations:
.:
F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
F : Spec:
.:
F : Env Vars:
F : Flink Properties:
.:
F : Taskmanager . Number Of Task Slots:
F : Image:
.:
F : Name:
F : Pull Policy:
F : Job Manager:
.:
F : Access Scope:
F : Ports:
.:
F : Ui:
F : Resources:
.:
F : Limits:
.:
F : Cpu:
F : Memory:
F : Security Context:
.:
F : Run As Group:
F : Run As User:
F : Task Manager:
.:
F : Replicas:
F : Resources:
.:
F : Limits:
.:
F : Cpu:
F : Memory:
F : Sidecars:
F : Volume Mounts:
F : Volumes:
Manager: kubectl
Operation: Update
Time: 2021-07-19T14:56:59Z
Resource Version: 173782236
Self Link: /apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample
UID: 951f98d1-5943-44a1-ba19-f399a1d643ba
Spec:
Env Vars:
Name: FOO
Value: bar
Flink Properties:
Taskmanager . Number Of Task Slots: 1
Image:
Name: flink:1.8.2
Pull Policy: Always
Job Manager:
Access Scope: Cluster
Memory Off Heap Min: 600M
Memory Off Heap Ratio: 25
Ports:
Blob: 6124
Query: 6125
Rpc: 6123
Ui: 8081
Replicas: 1
Resources:
Limits:
Cpu: 200m
Memory: 1Gi
Security Context:
Run As Group: 9999
Run As User: 9999
Recreate On Update: true
Task Manager:
Memory Off Heap Min: 600M
Memory Off Heap Ratio: 25
Ports:
Data: 6121
Query: 6125
Rpc: 6122
Replicas: 1
Resources:
Limits:
Cpu: 200m
Memory: 1Gi
Sidecars:
Command:
sleep
10000
Image: alpine
Name: sidecar
Resources:
Volume Mounts:
Mount Path: /cache
Name: cache-volume
Volumes:
Empty Dir:
Name: cache-volume
Events: <none>
@yan234280533 is there any update on this ?
@sumchak1 I have same problem. Check the manager pod if there is no restarts or OOMKILL events. I increase requests and limits for the memory and after manager start with new resource config the cluster comes up as expected. When I execute kubectl top pod -n flink
the manager consumes a bit more memory then 30Mi so default values does not work.
@toniiiik @yan234280533 , I checked the manager logs and updated the role binding based on the error. But I can see soe error in my manager pod . can you please help me to identify what actually the issue is ? Also in flinkcluster
log I can see the session cluster status showing as creating in the event section.
$ kubectl get all -n flink-operator-system
NAME READY STATUS RESTARTS AGE
pod/cert-job-q5hvp 0/1 Completed 0 6m31s
pod/flink-operator-controller-manager-848b69b444-j88t2 2/2 Running 0 5m51s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/flink-operator-controller-manager-metrics-service ClusterIP 172.21.16.17 <none> 8443/TCP 7d15h
service/flink-operator-webhook-service ClusterIP 172.21.43.122 <none> 443/TCP 7d15h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/flink-operator-controller-manager 1/1 1 1 7d15h
NAME DESIRED CURRENT READY AGE
replicaset.apps/flink-operator-controller-manager-848b69b444 1 1 1 7d15h
NAME COMPLETIONS DURATION AGE
job.batch/cert-job 1/1 5s 6m33s
[sumit@sumit flink]$
[sumit@sumit flink]$
[sumit@sumit flink]$ kubectl top pod -n flink-operator-system
NAME CPU(cores) MEMORY(bytes)
flink-operator-controller-manager-848b69b444-j88t2 2m 25Mi
[sumit@sumit flink]$ kubectl logs -n flink-operator-system -l app=flink-operator --all-containers
I0722 06:07:33.426511 1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410 1 main.go:242] Listening securely on 0.0.0.0:8443
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
[sumit@sumit flink]$ kubectl logs flink-operator-controller-manager-848b69b444-j88t2 -n flink-operator-system --all-containers
I0722 06:07:33.426511 1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410 1 main.go:242] Listening securely on 0.0.0.0:8443
W0722 06:07:33.929936 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0722 06:07:35.158574 1 request.go:621] Throttling request took 1.048583353s, request: GET:https://172.21.0.1:443/apis/policy/v1beta1?timeout=32s
2021-07-22T06:07:35.209Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2021-07-22T06:07:35.209Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.209Z INFO controller-runtime.webhook registering webhook {"path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z INFO controller-runtime.webhook registering webhook {"path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z INFO setup Starting manager
2021-07-22T06:07:35.228Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
2021-07-22T06:07:35.228Z INFO controller-runtime.webhook.webhooks starting webhook server
2021-07-22T06:07:35.269Z INFO controller-runtime.certwatcher Updated current TLS certificate
2021-07-22T06:07:35.269Z INFO controller-runtime.controller Starting EventSource {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:35.269Z INFO controller-runtime.webhook serving webhook server {"host": "", "port": 443}
2021-07-22T06:07:35.269Z INFO controller-runtime.certwatcher Starting certificate watcher
2021-07-22T06:07:35.408Z INFO controller-runtime.controller Starting EventSource {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.089Z INFO controller-runtime.controller Starting EventSource {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.190Z INFO controller-runtime.controller Starting EventSource {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.291Z INFO controller-runtime.controller Starting EventSource {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.591Z INFO controller-runtime.controller Starting Controller {"controller": "flinkcluster"}
2021-07-22T06:07:36.591Z INFO controller-runtime.controller Starting workers {"controller": "flinkcluster", "worker count": 1}
2021-07-22T06:07:36.591Z INFO controllers.FlinkCluster ============================================================ {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z INFO controllers.FlinkCluster ---------- 1. Observe the current state ---------- {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z INFO controllers.FlinkCluster Observed cluster {"cluster": "default/flinksessioncluster-sample", "cluster": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinksessioncluster-sample","namespace":"default","selfLink":"/apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","resourceVersion":"173782236","generation":1,"creationTimestamp":"2021-07-19T14:57:00Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinksessioncluster-sample\",\"namespace\":\"default\"},\"spec\":{\"envVars\":[{\"name\":\"FOO\",\"value\":\"bar\"}],\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\",\"pullPolicy\":\"Always\"},\"jobManager\":{\"accessScope\":\"Cluster\",\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"securityContext\":{\"runAsGroup\":9999,\"runAsUser\":9999}},\"taskManager\":{\"replicas\":1,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"sidecars\":[{\"command\":[\"sleep\",\"10000\"],\"image\":\"alpine\",\"name\":\"sidecar\"}],\"volumeMounts\":[{\"mountPath\":\"/cache\",\"name\":\"cache-volume\"}],\"volumes\":[{\"emptyDir\":{},\"name\":\"cache-volume\"}]}}}\n"},"managedFields":[{"manager":"kubectl","operation":"Update","apiVersion":"flinkoperator.k8s.io/v1beta1","time":"2021-07-19T14:56:59Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:envVars":{},"f:flinkProperties":{".":{},"f:taskmanager.numberOfTaskSlots":{}},"f:image":{".":{},"f:name":{},"f:pullPolicy":{}},"f:jobManager":{".":{},"f:accessScope":{},"f:ports":{".":{},"f:ui":{}},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:securityContext":{".":{},"f:runAsGroup":{},"f:runAsUser":{}}},"f:taskManager":{".":{},"f:replicas":{},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:sidecars":{},"f:volumeMounts":{},"f:volumes":{}}}}}]},"spec":{"image":{"name":"flink:1.8.2","pullPolicy":"Always"},"jobManager":{"replicas":1,"accessScope":"Cluster","ports":{"rpc":6123,"blob":6124,"query":6125,"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","securityContext":{"runAsUser":9999,"runAsGroup":9999}},"taskManager":{"replicas":1,"ports":{"data":6121,"rpc":6122,"query":6125},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","volumes":[{"name":"cache-volume","emptyDir":{}}],"volumeMounts":[{"name":"cache-volume","mountPath":"/cache"}],"sidecars":[{"name":"sidecar","image":"alpine","command":["sleep","10000"],"resources":{}}]},"envVars":[{"name":"FOO","value":"bar"}],"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"},"recreateOnUpdate":true},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:07:36.829Z INFO controllers.FlinkCluster Observed controllerRevisions {"cluster": "default/flinksessioncluster-sample", "controllerRevisions": "[{name: flinksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:07:37.568Z INFO controllers.FlinkCluster Observed configMap {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z INFO controllers.FlinkCluster Deployment not found {"cluster": "default/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:07:37.568Z INFO controllers.FlinkCluster Observed JobManager StatefulSet {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z INFO controllers.FlinkCluster Observed JobManager service {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z INFO controllers.FlinkCluster Observed JobManager ingress {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z INFO controllers.FlinkCluster Deployment not found {"cluster": "default/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:07:37.708Z INFO controllers.FlinkCluster Observed TaskManager StatefulSet {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.710Z INFO controllers.FlinkCluster ---------- 2. Update cluster status ---------- {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:37.711Z INFO controllers.FlinkCluster Cluster state changed {"cluster": "default/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:07:37.728Z INFO controllers.FlinkCluster FlinkCluster revision status changed {"cluster": "default/flinksessioncluster-sample", "current": "currentRevision: , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:07:37.728Z INFO controllers.FlinkCluster Status changed {"cluster": "default/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessioncluster-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:07:37.729Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"FlinkCluster","namespace":"default","name":"flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"173782236"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:07:37.788Z ERROR controllers.FlinkCluster Failed to update cluster status {"cluster": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile
/workspace/controllers/flinkcluster_controller.go:162
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile
/workspace/controllers/flinkcluster_controller.go:82
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
2021-07-22T06:07:37.788Z ERROR controller-runtime.controller Reconciler error {"controller": "flinkcluster", "request": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
/root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
/root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
$ kubectl describe flinkcluster flinksessioncluster-sample -n flink-operator-system
Name: flinksessioncluster-sample
Namespace: flink-operator-system
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version: flinkoperator.k8s.io/v1beta1
Kind: FlinkCluster
Metadata:
Creation Timestamp: 2021-07-22T06:09:20Z
Generation: 1
Managed Fields:
API Version: flinkoperator.k8s.io/v1beta1
Fields Type: FieldsV1
Fields V 1:
F : Metadata:
F : Annotations:
.:
F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
F : Spec:
.:
F : Env Vars:
F : Flink Properties:
.:
F : Taskmanager . Number Of Task Slots:
F : Image:
.:
F : Name:
F : Pull Policy:
F : Job Manager:
.:
F : Access Scope:
F : Ports:
.:
F : Ui:
F : Resources:
.:
F : Limits:
.:
F : Cpu:
F : Memory:
F : Security Context:
.:
F : Run As Group:
F : Run As User:
F : Task Manager:
.:
F : Replicas:
F : Resources:
.:
F : Limits:
.:
F : Cpu:
F : Memory:
F : Sidecars:
F : Volume Mounts:
F : Volumes:
Manager: kubectl
Operation: Update
Time: 2021-07-22T06:09:20Z
Resource Version: 175069579
Self Link: /apis/flinkoperator.k8s.io/v1beta1/namespaces/flink-operator-system/flinkclusters/flinksessioncluster-sample
UID: 18bd7904-a582-4433-a731-23d37813b1fd
Spec:
Env Vars:
Name: FOO
Value: bar
Flink Properties:
Taskmanager . Number Of Task Slots: 1
Image:
Name: flink:1.8.2
Pull Policy: Always
Job Manager:
Access Scope: Cluster
Memory Off Heap Min: 600M
Memory Off Heap Ratio: 25
Ports:
Blob: 6124
Query: 6125
Rpc: 6123
Ui: 8081
Replicas: 1
Resources:
Limits:
Cpu: 200m
Memory: 1Gi
Security Context:
Run As Group: 9999
Run As User: 9999
Recreate On Update: true
Task Manager:
Memory Off Heap Min: 600M
Memory Off Heap Ratio: 25
Ports:
Data: 6121
Query: 6125
Rpc: 6122
Replicas: 1
Resources:
Limits:
Cpu: 200m
Memory: 1Gi
Sidecars:
Command:
sleep
10000
Image: alpine
Name: sidecar
Resources:
Volume Mounts:
Mount Path: /cache
Name: cache-volume
Volumes:
Empty Dir:
Name: cache-volume
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal StatusUpdate 48s (x16 over 3m39s) FlinkOperator Cluster status: Creating
I am using https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml to create the flinksessioncluster
but it says taskmanager
and jobmanager
deployment not found .
state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Observed controllerRevisions {"cluster": "flink-operator-system/flinksessioncluster-sample", "controllerRevisions": "[{name: flin
ksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Observed configMap {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Deployment not found {"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Observed JobManager StatefulSet {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Observed JobManager service {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Observed JobManager ingress {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z INFO controllers.FlinkCluster Deployment not found {"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:09:29.163Z INFO controllers.FlinkCluster Observed TaskManager StatefulSet {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.164Z INFO controllers.FlinkCluster ---------- 2. Update cluster status ---------- {"cluster": "flink-operator-system/flinksessioncluster-sample"}
2021-07-22T06:09:29.164Z INFO controllers.FlinkCluster Cluster state changed {"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:09:29.164Z INFO controllers.FlinkCluster FlinkCluster revision status changed {"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "currentRevision:
, nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:09:29.164Z INFO controllers.FlinkCluster Status changed {"cluster": "flink-operator-system/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name"
:"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"confi
gMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessionclust
er-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:09:29.165Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"FlinkCluster","namespace":"flink-operator-system","name":"flinksessioncluster-sample","uid":"18b
d7904-a582-4433-a731-23d37813b1fd","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"175069579"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:09:29.197Z ERROR controllers.FlinkCluster Failed to update cluster status {"cluster": "flink-operator-system/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator
.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
@yan234280533 is there any update on this ?
@yan234280533 is there any update on this ?
HI @mishra157, I think the community is moving to the https://github.com/spotify/flink-on-k8s-operator/ fork now (see this discussion: https://github.com/spotify/flink-on-k8s-operator/issues/82) , you probably have a better chance trying with that version and, if the bug is still present, report the issue over there
based on https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/356, we have tried all the mentioned steps but still the flink session cluster installation is failed.
First we tried with the below steps and it didn't helped
Again tried by editing the config-map to change the default expires days and it also didn't helped us
| openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem
| openssl x509 -days 3650 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem