GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 266 forks source link

Flink Session Cluster Installation is faling #464

Open sumchak1 opened 3 years ago

sumchak1 commented 3 years ago

based on https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/356, we have tried all the mentioned steps but still the flink session cluster installation is failed.

First we tried with the below steps and it didn't helped

        kubectl get job cert-job -n flink-operator-system -oyaml > cert-job.yaml
        kubectl delete job cert-job -n flink-operator-system
        kubectl apply -f cert-job.yaml

Again tried by editing the config-map to change the default expires days and it also didn't helped us | openssl x509 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

change to:

| openssl x509 -days 3650 -req -CA ca.crt -CAkey ca.key -CAcreateserial -out ${tmpdir}/server-cert.pem

k delete -f config-map-up1.yaml -n flink-operator-system
configmap "cert-configmap" deleted

k apply -f config-map-up1.yaml -n flink-operator-system
configmap/cert-configmap created

kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS    RESTARTS   AGE
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running   0          43m

k apply -f cert-job-1.yaml -n flink-operator-system
job.batch/cert-job created

kubectl get pods -n flink-operator-system
NAME                                                 READY   STATUS      RESTARTS   AGE
cert-job-lgxzt                                       0/1     Completed   0          7s
flink-operator-controller-manager-848b69b444-8v9l5   2/2     Running     0          44m

 kubectl apply -f config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml
Error from server (InternalError): error when creating "config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post "https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s": x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0
yan234280533 commented 3 years ago

It like this issue:

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/390

you can have an try

sumchak1 commented 3 years ago

We could solve the issue by creating a SAN certificate for the webhook. But after creating the custom resource for FlinkCluster, it didn't create pod, svc etc. Even, there is no event showing for this custom resource. Please check below:

$ kubectl apply -f flinkoperator_v1beta1_flinksessioncluster.yaml
flinkcluster.flinkoperator.k8s.io/flinksessioncluster-sample created
[sumit@sumit flink]$ 
[sumit@sumit flink]$ kubectl get pods 
NAME                                           READY   STATUS             RESTARTS   AGE
ddpstreamappdevtest-b5684688b-wz55p            0/1     ImagePullBackOff   0          12d
default-sparkoperator-667fff9765-g6trh         1/1     Running            0          12d
hive-1578926540-metastore-6dd4f78f9b-hkgns     1/1     Running            0          12d
hive-1578926540-server2-68d8685996-p5lf7       1/1     Running            0          12d
influxdbd2b0d-58bcdd89fb-xmv5m                 1/1     Running            0          12d
ingress-checker-1626517800-jzn4l               0/1     Completed          0          2d4h
ingress-checker-1626604200-476v6               0/1     Completed          0          28h
ingress-checker-1626690600-mfdzt               0/1     Completed          0          4h27m
logdna-agent-4zbnv                             1/1     Running            2          153d
logdna-agent-65r77                             1/1     Running            14         153d
logdna-agent-6ld8l                             1/1     Running            3          153d
logdna-agent-8g2ch                             1/1     Running            10         153d
logdna-agent-n22dh                             1/1     Running            6          153d
logdna-agent-wdj54                             1/1     Running            5          153d
overprovisioning-6d695dd44c-bvlk6              0/1     Pending            0          9d
overprovisioning-6d695dd44c-cz5j7              1/1     Running            0          12d
overprovisioning-6d695dd44c-j2dt6              1/1     Running            0          5d4h
overprovisioning-6d695dd44c-nxjv5              1/1     Running            0          12d
overprovisioning-6d695dd44c-p5jf9              1/1     Running            0          6d21h
overprovisioning-6d695dd44c-qhsgq              0/1     Pending            0          5d4h
overprovisioning-autoscaler-587ff88c66-5wcd7   1/1     Running            0          12d
privingressapp-8595c5f87d-xjv9c                1/1     Running            0          12d
[sumit@sumit flink]$ kubectl get FlinkCluster
NAME                         AGE
flinksessioncluster-sample   32s
[sumit@sumit flink]$ kubectl describe FlinkCluster flinksessioncluster-sample 
Name:         flinksessioncluster-sample
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version:  flinkoperator.k8s.io/v1beta1
Kind:         FlinkCluster
Metadata:
  Creation Timestamp:  2021-07-19T14:57:00Z
  Generation:          1
  Managed Fields:
    API Version:  flinkoperator.k8s.io/v1beta1
    Fields Type:  FieldsV1
    Fields V 1:
      F : Metadata:
        F : Annotations:
          .:
          F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
      F : Spec:
        .:
        F : Env Vars:
        F : Flink Properties:
          .:
          F : Taskmanager . Number Of Task Slots:
        F : Image:
          .:
          F : Name:
          F : Pull Policy:
        F : Job Manager:
          .:
          F : Access Scope:
          F : Ports:
            .:
            F : Ui:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Security Context:
            .:
            F : Run As Group:
            F : Run As User:
        F : Task Manager:
          .:
          F : Replicas:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Sidecars:
          F : Volume Mounts:
          F : Volumes:
    Manager:         kubectl
    Operation:       Update
    Time:            2021-07-19T14:56:59Z
  Resource Version:  173782236
  Self Link:         /apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample
  UID:               951f98d1-5943-44a1-ba19-f399a1d643ba
Spec:
  Env Vars:
    Name:   FOO
    Value:  bar
  Flink Properties:
    Taskmanager . Number Of Task Slots:  1
  Image:
    Name:         flink:1.8.2
    Pull Policy:  Always
  Job Manager:
    Access Scope:           Cluster
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Blob:    6124
      Query:   6125
      Rpc:     6123
      Ui:      8081
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Security Context:
      Run As Group:    9999
      Run As User:     9999
  Recreate On Update:  true
  Task Manager:
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Data:    6121
      Query:   6125
      Rpc:     6122
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Sidecars:
      Command:
        sleep
        10000
      Image:  alpine
      Name:   sidecar
      Resources:
    Volume Mounts:
      Mount Path:  /cache
      Name:        cache-volume
    Volumes:
      Empty Dir:
      Name:  cache-volume
Events:      <none>
sumchak1 commented 3 years ago

@yan234280533 is there any update on this ?

toniiiik commented 3 years ago

@sumchak1 I have same problem. Check the manager pod if there is no restarts or OOMKILL events. I increase requests and limits for the memory and after manager start with new resource config the cluster comes up as expected. When I execute kubectl top pod -n flink the manager consumes a bit more memory then 30Mi so default values does not work.

sumchak1 commented 3 years ago

@toniiiik @yan234280533 , I checked the manager logs and updated the role binding based on the error. But I can see soe error in my manager pod . can you please help me to identify what actually the issue is ? Also in flinkcluster log I can see the session cluster status showing as creating in the event section.

$ kubectl get all -n flink-operator-system
NAME                                                     READY   STATUS      RESTARTS   AGE
pod/cert-job-q5hvp                                       0/1     Completed   0          6m31s
pod/flink-operator-controller-manager-848b69b444-j88t2   2/2     Running     0          5m51s

NAME                                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/flink-operator-controller-manager-metrics-service   ClusterIP   172.21.16.17    <none>        8443/TCP   7d15h
service/flink-operator-webhook-service                      ClusterIP   172.21.43.122   <none>        443/TCP    7d15h

NAME                                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-operator-controller-manager   1/1     1            1           7d15h

NAME                                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-operator-controller-manager-848b69b444   1         1         1       7d15h

NAME                 COMPLETIONS   DURATION   AGE
job.batch/cert-job   1/1           5s         6m33s
[sumit@sumit flink]$ 
[sumit@sumit flink]$ 
[sumit@sumit flink]$ kubectl top pod -n flink-operator-system
NAME                                                 CPU(cores)   MEMORY(bytes)   
flink-operator-controller-manager-848b69b444-j88t2   2m           25Mi   

[sumit@sumit flink]$ kubectl logs -n flink-operator-system -l app=flink-operator --all-containers
I0722 06:07:33.426511       1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410       1 main.go:242] Listening securely on 0.0.0.0:8443
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90

[sumit@sumit flink]$ kubectl logs flink-operator-controller-manager-848b69b444-j88t2 -n flink-operator-system --all-containers
I0722 06:07:33.426511       1 main.go:209] Generating self signed cert as no cert is provided
I0722 06:07:33.570410       1 main.go:242] Listening securely on 0.0.0.0:8443
W0722 06:07:33.929936       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0722 06:07:35.158574       1 request.go:621] Throttling request took 1.048583353s, request: GET:https://172.21.0.1:443/apis/policy/v1beta1?timeout=32s
2021-07-22T06:07:35.209Z    INFO    controller-runtime.metrics  metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2021-07-22T06:07:35.209Z    INFO    controller-runtime.builder  Registering a mutating webhook  {"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.209Z    INFO    controller-runtime.webhook  registering webhook {"path": "/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z    INFO    controller-runtime.builder  Registering a validating webhook    {"GVK": "flinkoperator.k8s.io/v1beta1, Kind=FlinkCluster", "path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z    INFO    controller-runtime.webhook  registering webhook {"path": "/validate-flinkoperator-k8s-io-v1beta1-flinkcluster"}
2021-07-22T06:07:35.210Z    INFO    setup   Starting manager
2021-07-22T06:07:35.228Z    INFO    controller-runtime.manager  starting metrics server {"path": "/metrics"}
2021-07-22T06:07:35.228Z    INFO    controller-runtime.webhook.webhooks starting webhook server
2021-07-22T06:07:35.269Z    INFO    controller-runtime.certwatcher  Updated current TLS certificate
2021-07-22T06:07:35.269Z    INFO    controller-runtime.controller   Starting EventSource    {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:35.269Z    INFO    controller-runtime.webhook  serving webhook server  {"host": "", "port": 443}
2021-07-22T06:07:35.269Z    INFO    controller-runtime.certwatcher  Starting certificate watcher
2021-07-22T06:07:35.408Z    INFO    controller-runtime.controller   Starting EventSource    {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.089Z    INFO    controller-runtime.controller   Starting EventSource    {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.190Z    INFO    controller-runtime.controller   Starting EventSource    {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.291Z    INFO    controller-runtime.controller   Starting EventSource    {"controller": "flinkcluster", "source": "kind source: /, Kind="}
2021-07-22T06:07:36.591Z    INFO    controller-runtime.controller   Starting Controller {"controller": "flinkcluster"}
2021-07-22T06:07:36.591Z    INFO    controller-runtime.controller   Starting workers    {"controller": "flinkcluster", "worker count": 1}
2021-07-22T06:07:36.591Z    INFO    controllers.FlinkCluster    ============================================================    {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z    INFO    controllers.FlinkCluster    ---------- 1. Observe the current state ----------  {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:36.591Z    INFO    controllers.FlinkCluster    Observed cluster    {"cluster": "default/flinksessioncluster-sample", "cluster": {"kind":"FlinkCluster","apiVersion":"flinkoperator.k8s.io/v1beta1","metadata":{"name":"flinksessioncluster-sample","namespace":"default","selfLink":"/apis/flinkoperator.k8s.io/v1beta1/namespaces/default/flinkclusters/flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","resourceVersion":"173782236","generation":1,"creationTimestamp":"2021-07-19T14:57:00Z","annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"flinkoperator.k8s.io/v1beta1\",\"kind\":\"FlinkCluster\",\"metadata\":{\"annotations\":{},\"name\":\"flinksessioncluster-sample\",\"namespace\":\"default\"},\"spec\":{\"envVars\":[{\"name\":\"FOO\",\"value\":\"bar\"}],\"flinkProperties\":{\"taskmanager.numberOfTaskSlots\":\"1\"},\"image\":{\"name\":\"flink:1.8.2\",\"pullPolicy\":\"Always\"},\"jobManager\":{\"accessScope\":\"Cluster\",\"ports\":{\"ui\":8081},\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"securityContext\":{\"runAsGroup\":9999,\"runAsUser\":9999}},\"taskManager\":{\"replicas\":1,\"resources\":{\"limits\":{\"cpu\":\"200m\",\"memory\":\"1024Mi\"}},\"sidecars\":[{\"command\":[\"sleep\",\"10000\"],\"image\":\"alpine\",\"name\":\"sidecar\"}],\"volumeMounts\":[{\"mountPath\":\"/cache\",\"name\":\"cache-volume\"}],\"volumes\":[{\"emptyDir\":{},\"name\":\"cache-volume\"}]}}}\n"},"managedFields":[{"manager":"kubectl","operation":"Update","apiVersion":"flinkoperator.k8s.io/v1beta1","time":"2021-07-19T14:56:59Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:envVars":{},"f:flinkProperties":{".":{},"f:taskmanager.numberOfTaskSlots":{}},"f:image":{".":{},"f:name":{},"f:pullPolicy":{}},"f:jobManager":{".":{},"f:accessScope":{},"f:ports":{".":{},"f:ui":{}},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:securityContext":{".":{},"f:runAsGroup":{},"f:runAsUser":{}}},"f:taskManager":{".":{},"f:replicas":{},"f:resources":{".":{},"f:limits":{".":{},"f:cpu":{},"f:memory":{}}},"f:sidecars":{},"f:volumeMounts":{},"f:volumes":{}}}}}]},"spec":{"image":{"name":"flink:1.8.2","pullPolicy":"Always"},"jobManager":{"replicas":1,"accessScope":"Cluster","ports":{"rpc":6123,"blob":6124,"query":6125,"ui":8081},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","securityContext":{"runAsUser":9999,"runAsGroup":9999}},"taskManager":{"replicas":1,"ports":{"data":6121,"rpc":6122,"query":6125},"resources":{"limits":{"cpu":"200m","memory":"1Gi"}},"memoryOffHeapRatio":25,"memoryOffHeapMin":"600M","volumes":[{"name":"cache-volume","emptyDir":{}}],"volumeMounts":[{"name":"cache-volume","mountPath":"/cache"}],"sidecars":[{"name":"sidecar","image":"alpine","command":["sleep","10000"],"resources":{}}]},"envVars":[{"name":"FOO","value":"bar"}],"flinkProperties":{"taskmanager.numberOfTaskSlots":"1"},"recreateOnUpdate":true},"status":{"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:07:36.829Z    INFO    controllers.FlinkCluster    Observed controllerRevisions    {"cluster": "default/flinksessioncluster-sample", "controllerRevisions": "[{name: flinksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:07:37.568Z    INFO    controllers.FlinkCluster    Observed configMap  {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "default/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:07:37.568Z    INFO    controllers.FlinkCluster    Observed JobManager StatefulSet {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.568Z    INFO    controllers.FlinkCluster    Observed JobManager service {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z    INFO    controllers.FlinkCluster    Observed JobManager ingress {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.708Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "default/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:07:37.708Z    INFO    controllers.FlinkCluster    Observed TaskManager StatefulSet    {"cluster": "default/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:07:37.710Z    INFO    controllers.FlinkCluster    ---------- 2. Update cluster status ----------  {"cluster": "default/flinksessioncluster-sample"}
2021-07-22T06:07:37.711Z    INFO    controllers.FlinkCluster    Cluster state changed   {"cluster": "default/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:07:37.728Z    INFO    controllers.FlinkCluster    FlinkCluster revision status changed    {"cluster": "default/flinksessioncluster-sample", "current": "currentRevision: , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:07:37.728Z    INFO    controllers.FlinkCluster    Status changed  {"cluster": "default/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"configMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessioncluster-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:07:37.729Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"FlinkCluster","namespace":"default","name":"flinksessioncluster-sample","uid":"951f98d1-5943-44a1-ba19-f399a1d643ba","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"173782236"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:07:37.788Z    ERROR   controllers.FlinkCluster    Failed to update cluster status {"cluster": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
    /root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterHandler).reconcile
    /workspace/controllers/flinkcluster_controller.go:162
github.com/googlecloudplatform/flink-operator/controllers.(*FlinkClusterReconciler).Reconcile
    /workspace/controllers/flinkcluster_controller.go:82
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90
2021-07-22T06:07:37.788Z    ERROR   controller-runtime.controller   Reconciler error    {"controller": "flinkcluster", "request": "default/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
    /root/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
    /root/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.6.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.Until
    /root/go/pkg/mod/k8s.io/apimachinery@v0.18.3/pkg/util/wait/wait.go:90

$ kubectl describe flinkcluster flinksessioncluster-sample -n flink-operator-system
Name:         flinksessioncluster-sample
Namespace:    flink-operator-system
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"flinkoperator.k8s.io/v1beta1","kind":"FlinkCluster","metadata":{"annotations":{},"name":"flinksessioncluster-sample","names...
API Version:  flinkoperator.k8s.io/v1beta1
Kind:         FlinkCluster
Metadata:
  Creation Timestamp:  2021-07-22T06:09:20Z
  Generation:          1
  Managed Fields:
    API Version:  flinkoperator.k8s.io/v1beta1
    Fields Type:  FieldsV1
    Fields V 1:
      F : Metadata:
        F : Annotations:
          .:
          F : Kubectl . Kubernetes . Io / Last - Applied - Configuration:
      F : Spec:
        .:
        F : Env Vars:
        F : Flink Properties:
          .:
          F : Taskmanager . Number Of Task Slots:
        F : Image:
          .:
          F : Name:
          F : Pull Policy:
        F : Job Manager:
          .:
          F : Access Scope:
          F : Ports:
            .:
            F : Ui:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Security Context:
            .:
            F : Run As Group:
            F : Run As User:
        F : Task Manager:
          .:
          F : Replicas:
          F : Resources:
            .:
            F : Limits:
              .:
              F : Cpu:
              F : Memory:
          F : Sidecars:
          F : Volume Mounts:
          F : Volumes:
    Manager:         kubectl
    Operation:       Update
    Time:            2021-07-22T06:09:20Z
  Resource Version:  175069579
  Self Link:         /apis/flinkoperator.k8s.io/v1beta1/namespaces/flink-operator-system/flinkclusters/flinksessioncluster-sample
  UID:               18bd7904-a582-4433-a731-23d37813b1fd
Spec:
  Env Vars:
    Name:   FOO
    Value:  bar
  Flink Properties:
    Taskmanager . Number Of Task Slots:  1
  Image:
    Name:         flink:1.8.2
    Pull Policy:  Always
  Job Manager:
    Access Scope:           Cluster
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Blob:    6124
      Query:   6125
      Rpc:     6123
      Ui:      8081
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Security Context:
      Run As Group:    9999
      Run As User:     9999
  Recreate On Update:  true
  Task Manager:
    Memory Off Heap Min:    600M
    Memory Off Heap Ratio:  25
    Ports:
      Data:    6121
      Query:   6125
      Rpc:     6122
    Replicas:  1
    Resources:
      Limits:
        Cpu:     200m
        Memory:  1Gi
    Sidecars:
      Command:
        sleep
        10000
      Image:  alpine
      Name:   sidecar
      Resources:
    Volume Mounts:
      Mount Path:  /cache
      Name:        cache-volume
    Volumes:
      Empty Dir:
      Name:  cache-volume
Events:
  Type    Reason        Age                   From           Message
  ----    ------        ----                  ----           -------
  Normal  StatusUpdate  48s (x16 over 3m39s)  FlinkOperator  Cluster status: Creating
sumchak1 commented 3 years ago

I am using https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/config/samples/flinkoperator_v1beta1_flinksessioncluster.yaml to create the flinksessioncluster but it says taskmanager and jobmanager deployment not found .

state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}}}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Observed controllerRevisions    {"cluster": "flink-operator-system/flinksessioncluster-sample", "controllerRevisions": "[{name: flin
ksessioncluster-sample-84fdb95d89, revision: 1},]"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Observed configMap  {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "JobManager"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Observed JobManager StatefulSet {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Observed JobManager service {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Observed JobManager ingress {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.162Z    INFO    controllers.FlinkCluster    Deployment not found    {"cluster": "flink-operator-system/flinksessioncluster-sample", "component": "TaskManager"}
2021-07-22T06:09:29.163Z    INFO    controllers.FlinkCluster    Observed TaskManager StatefulSet    {"cluster": "flink-operator-system/flinksessioncluster-sample", "state": "nil"}
2021-07-22T06:09:29.164Z    INFO    controllers.FlinkCluster    ---------- 2. Update cluster status ----------  {"cluster": "flink-operator-system/flinksessioncluster-sample"}
2021-07-22T06:09:29.164Z    INFO    controllers.FlinkCluster    Cluster state changed   {"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "", "new": "Creating"}
2021-07-22T06:09:29.164Z    INFO    controllers.FlinkCluster    FlinkCluster revision status changed    {"cluster": "flink-operator-system/flinksessioncluster-sample", "current": "currentRevision:
 , nextRevision: , collisionCount: <nil>", "new": "currentRevision: flinksessioncluster-sample-84fdb95d89-1, nextRevision: flinksessioncluster-sample-84fdb95d89-1, collisionCount: <nil>"}
2021-07-22T06:09:29.164Z    INFO    controllers.FlinkCluster    Status changed  {"cluster": "flink-operator-system/flinksessioncluster-sample", "old": {"state":"","components":{"configMap":{"name"
:"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}}}, "new": {"state":"Creating","components":{"confi
gMap":{"name":"","state":""},"jobManagerStatefulSet":{"name":"","state":""},"jobManagerService":{"name":"","state":""},"taskManagerStatefulSet":{"name":"","state":""}},"currentRevision":"flinksessionclust
er-sample-84fdb95d89-1","nextRevision":"flinksessioncluster-sample-84fdb95d89-1"}}
2021-07-22T06:09:29.165Z    DEBUG   controller-runtime.manager.events   Normal  {"object": {"kind":"FlinkCluster","namespace":"flink-operator-system","name":"flinksessioncluster-sample","uid":"18b
d7904-a582-4433-a731-23d37813b1fd","apiVersion":"flinkoperator.k8s.io/v1beta1","resourceVersion":"175069579"}, "reason": "StatusUpdate", "message": "Cluster status: Creating"}
2021-07-22T06:09:29.197Z    ERROR   controllers.FlinkCluster    Failed to update cluster status {"cluster": "flink-operator-system/flinksessioncluster-sample", "error": "FlinkCluster.flinkoperator
.k8s.io \"flinksessioncluster-sample\" is invalid: [status.components.jobManagerDeployment: Required value, status.components.taskManagerDeployment: Required value]"}
mishra157 commented 3 years ago

@yan234280533 is there any update on this ?

mishra157 commented 3 years ago

@yan234280533 is there any update on this ?

sv3ndk commented 3 years ago

HI @mishra157, I think the community is moving to the https://github.com/spotify/flink-on-k8s-operator/ fork now (see this discussion: https://github.com/spotify/flink-on-k8s-operator/issues/82) , you probably have a better chance trying with that version and, if the bug is still present, report the issue over there