GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
657 stars 265 forks source link

Cannot create sample FlinkCluster: "flinkjobcluster-sample" is invalid: [status.components.jobManagerStatefulSet: Required value, status.components.taskManagerStatefulSet: Required value] #393

Open fabito opened 3 years ago

fabito commented 3 years ago

After installing the operator using make deploy, I'm trying to create the sample JobCluster by running:

kubectl apply -f config/samples/flinkoperator_v1beta1_flinkjobcluster.yam

The reconciler never succeeds and the following errors are raised:

flink-operator 2021-01-12T10:09:47.351Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster", "UID": "f19a204c-c96f-4180-83a1-9909cfa6fb13", "kind": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "resource": {"group":"flinkoperator.k8s.io","version":"v1beta1","resource":"flinkclusters"}}
flink-operator 2021-01-12T10:09:47.370Z    INFO    webhook    default    {"name": "flinkjobcluster-sample", "original": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinkjobcluster-sample","namespace":"default","creationTimestamp":null,"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinkjobcluster-sample\",\"namespace\":\"default\"},\"spec\":{\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\"},\"job\":{\"args\":[\"--input\",\"./README.txt\"],\"className\":\"org.apache.flink.streaming.examples.wordcount.WordCount\",\"jarFile\":\"./examples/streaming/WordCount.jar\",\"parallelism\":2},\"jobManager\":{\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}},\"taskManager\":{\"replicas\":2,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}}}}\n"}},"spec":{"image":{"name":"flink:1.8.2"},"jobManager":{"accessScope":"","ports":{"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapMin":"0"},"taskManager":{"replicas":2,"ports":{},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapMin":"0"},"job":{"jarFile":"./examples/streaming/WordCount.jar","className":"org.apache.flink.streaming.examples.wordcount.WordCount","args":["--input","./README.txt"],"parallelism":2,"restartPolicy":null,"resources":{}},"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"}},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerDeployment":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerDeployment":{"name":"","state":""}}}}}
flink-operator 2021-01-12T10:09:47.372Z    INFO    webhook    default    {"name": "flinkjobcluster-sample", "augmented": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinkjobcluster-sample","namespace":"default","creationTimestamp":null,"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinkjobcluster-sample\",\"namespace\":\"default\"},\"spec\":{\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\"},\"job\":{\"args\":[\"--input\",\"./README.txt\"],\"className\":\"org.apache.flink.streaming.examples.wordcount.WordCount\",\"jarFile\":\"./examples/streaming/WordCount.jar\",\"parallelism\":2},\"jobManager\":{\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}},\"taskManager\":{\"replicas\":2,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}}}}\n"}},"spec":{"image":{"name":"flink:1.8.2","pullPolicy":"Always"},"jobManager":{"replicas":1,"accessScope":"Cluster","ports":{"rpc":6123,"blob":6124,"query":6125,"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M"},"taskManager":{"replicas":2,"ports":{"data":6121,"rpc":6122,"query":6125},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M"},"job":{"jarFile":"./examples/streaming/WordCount.jar","className":"org.apache.flink.streaming.examples.wordcount.WordCount","args":["--input","./README.txt"],"allowNonRestoredState":false,"parallelism":2,"noLoggingToStdout":false,"restartPolicy":"Never","cleanupPolicy":{"afterJobSucceeds":"DeleteCluster","afterJobFails":"KeepCluster","afterJobCancelled":"DeleteCluster"},"resources":{}},"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"}},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerDeployment":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerDeployment":{"name":"","state":""}}}}}
flink-operator 2021-01-12T10:09:47.373Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster", "UID": "f19a204c-c96f-4180-83a1-9909cfa6fb13", "allowed": true, "result": {}, "resultError": "got runtime.Object without object metadata: &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:,Message:,Reason:,Details:nil,Code:200,}"}
flink-operator 2021-01-12T10:09:47.380Z    DEBUG    controller-runtime.webhook.webhooks    received request    {"webhook": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster", "UID": "f708f350-0447-4929-93f2-363f70a326c9", "kind": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "resource": {"group":"flinkoperator.k8s.io","version":"v1beta1","resource":"flinkclusters"}}
flink-operator 2021-01-12T10:09:47.380Z    INFO    webhook    Validate create    {"name": "flinkjobcluster-sample"}
flink-operator 2021-01-12T10:09:47.380Z    DEBUG    controller-runtime.webhook.webhooks    wrote response    {"webhook": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster", "UID": "f708f350-0447-4929-93f2-363f70a326c9", "allowed": true, "result": {}, "resultError": "got runtime.Object without object metadata: &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:,Message:,Reason:,Details:nil,Code:200,}"}
flink-operator 2021-01-12T10:09:47.391Z    INFO    controllers.FlinkCluster    ============================================================    {"cluster": "default/flinkjobcluster-sample"}
flink-operator 2021-01-12T10:09:47.391Z    INFO    controllers.FlinkCluster    ---------- 1. Observe the current state ----------    {"cluster": "default/flinkjobcluster-sample"}
flink-operator 2021-01-12T10:09:47.391Z    INFO    controllers.FlinkCluster    Observed cluster    {"cluster": "default/flinkjobcluster-sample", "cluster": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinkjobcluster-sample","namespace":"default","selfLink":"/apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinkjobcluster-sample","uid":"ef9e6316-7196-49ec-af30-0602e91b4126","resourceVersion":"94275592","generation":1,"creationTimestamp":"2021-01-12T10:09:47Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinkjobcluster-sample\",\"namespace\":\"default\"},\"spec\":{\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\"},\"job\":{\"args\":[\"--input\",\"./README.txt\"],\"className\":\"org.apache.flink.streaming.examples.wordcount.WordCount\",\"jarFile\":\"./examples/streaming/WordCount.jar\",\"parallelism\":2},\"jobManager\":{\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}},\"taskManager\":{\"replicas\":2,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}}}}}\n"}},"spec":{"image":{"name":"flink:1.8.2","pullPolicy":"Always"},"jobManager":{"replicas":1,"accessScope":"Cluster","ports":{"rpc":6123,"blob":6124,"query":6125,"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M"},"taskManager":{"replicas":2,"ports":{"data":6121,"rpc":6122,"query":6125},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M"},"job":{"jarFile":"./examples/streaming/WordCount.jar","className":"org.apache.flink.streaming.examples.wordcount.WordCount","args":["--input","./README.txt"],"allowNonRestoredState":false,"parallelism":2,"noLoggingToStdout":false,"restartPolicy":"Never","cleanupPolicy":{"afterJobSucceeds":"DeleteCluster","afterJobFails":"KeepCluster","afterJobCancelled":"DeleteCluster"},"resources":{}},"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"}},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerDeployment":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerDeployment":{"name":"","state":""}}}}}
flink-operator 2021-01-12T10:09:47.670Z    INFO    controllers.FlinkCluster    Observed controllerRevisions    {"cluster": "default/flinkjobcluster-sample", "controllerRevisions": "[]"}
flink-operator 2021-01-12T10:09:47.971Z    INFO    controllers.FlinkCluster    Observed configMap    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:47.971Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "default/flinkjobcluster-sample", "component": "JobManager"}
flink-operator 2021-01-12T10:09:47.971Z    INFO    controllers.FlinkCluster    Observed JobManager deployment    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:47.971Z    INFO    controllers.FlinkCluster    Observed JobManager service    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:48.171Z    INFO    controllers.FlinkCluster    Observed JobManager ingress    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:48.171Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "default/flinkjobcluster-sample", "component": "TaskManager"}
flink-operator 2021-01-12T10:09:48.171Z    INFO    controllers.FlinkCluster    Observed TaskManager deployment    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:48.171Z    INFO    controllers.FlinkCluster    Skip getting Flink job status.    {"cluster": "default/flinkjobcluster-sample", "clusterState": ""}
flink-operator 2021-01-12T10:09:48.171Z    INFO    controllers.FlinkCluster    Observed job    {"cluster": "default/flinkjobcluster-sample", "state": "nil"}
flink-operator 2021-01-12T10:09:48.177Z    INFO    controllers.FlinkCluster    ---------- 2. Update cluster status ----------    {"cluster": "default/flinkjobcluster-sample"}
flink-operator 2021-01-12T10:09:48.177Z    INFO    controllers.FlinkCluster    Cluster state changed    {"cluster": "default/flinkjobcluster-sample", "current": "", "new": "Creating"}
flink-operator 2021-01-12T10:09:48.178Z    INFO    controllers.FlinkCluster    FlinkCluster revision status changed    {"cluster": "default/flinkjobcluster-sample", "current": "currentRevision: , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinkjobcluster-sample-5d96cb58dd-1, nextRevision: flinkjobcluster-sample-5d96cb58dd-1, collisionCount: <nil>"}
flink-operator 2021-01-12T10:09:48.178Z    INFO    controllers.FlinkCluster    Status changed    {"cluster": "default/flinkjobcluster-sample", "old": {"state":"","components":{"configMap":{"name":"","state":""},"jobManagerDeployment":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerDeployment":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"configMap":{"name":"","state":""},"jobManagerDeployment":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerDeployment":{"name":"","state":""}},"currentRevision":"flinkjobcluster-sample-5d96cb58dd-1","nextRevision":"flinkjobcluster-sample-5d96cb58dd-1"}}
flink-operator 2021-01-12T10:09:48.178Z    DEBUG    controller-runtime.manager.events    Normal    {"object": {"kind":"FlinkCluster","namespace":"default","name":"flinkjobcluster-sample","uid":"ef9e6316-7196-49ec-af30-0602e91b4126","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"94275592"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
flink-operator 2021-01-12T10:09:48.187Z    ERROR    controllers.FlinkCluster    Failed to update cluster status    {"cluster": "default/flinkjobcluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinkjobcluster-sample\" is invalid: [status.components.jobManagerStatefulSet: Required value, status.components.taskManagerStatefulSet: Required value]"}
flink-operator github.com/go-logr/zapr.(*zapLogger).Error
flink-operator     /root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
flink-operator github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile
flink-operator     /workspace/controllers/flinkcluster_controller.go:159
flink-operator github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile
flink-operator     /workspace/controllers/flinkcluster_controller.go:80
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
flink-operator k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
flink-operator k8s.io/apimachinery/pkg/util/wait.BackoffUntil
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
flink-operator k8s.io/apimachinery/pkg/util/wait.JitterUntil
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
flink-operator k8s.io/apimachinery/pkg/util/wait.Until
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
flink-operator 2021-01-12T10:09:48.187Z    ERROR    controller-runtime.controller    Reconciler error    {"controller": "flinkcluster", "request": "default/flinkjobcluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinkjobcluster-sample\" is invalid: [status.components.jobManagerStatefulSet: Required value, status.components.taskManagerStatefulSet: Required value]"}
flink-operator github.com/go-logr/zapr.(*zapLogger).Error
flink-operator     /root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
flink-operator sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
flink-operator     /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
flink-operator k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
flink-operator k8s.io/apimachinery/pkg/util/wait.BackoffUntil
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
flink-operator k8s.io/apimachinery/pkg/util/wait.JitterUntil
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
flink-operator k8s.io/apimachinery/pkg/util/wait.Until
flink-operator     /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
withlin commented 3 years ago

same issue

elanv commented 3 years ago

The CRD is up to date, but the official operator image doesn't seem to be. You can build your own image and deploy with it: https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/developer_guide.md

Or you can checkout previous commit to match the FlinkCluster CRD and the official operator image version.

jaggaer-mtrninic commented 3 years ago

@elanv Can you give us a commit hash? I don't see any version change in previous commit on master branch.

duclm2609 commented 3 years ago

I got the same problem :(.

jaggaer-mtrninic commented 3 years ago

@duclm2609 I solved it by checking out commit for last release (tag). https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/commit/f302312417ee2fe72e1034e5d1414208d6e1df91

duclm2609 commented 3 years ago

@jaggaer-mtrninic Thank you. I can deploy the sample now.

elanv commented 3 years ago

I have built new operator image with the latest commit. You can deploy operator with the image like: make deploy IMG=metatronapp/flink-operator.

If you have a docker hub account or other docker registry, you can build, push and deploy your own operator like: make operator-image push-operator-image deploy IMG=metatronapp/flink-operator.