GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
658 stars 265 forks source link

Failed calling webhook: context deadline exceeded on AWS EKS #339

Open acesir opened 4 years ago

acesir commented 4 years ago

We are experiencing the bellow error when attempting to create the flink job cluster using the sample provided in the repo. Our flink deployment with operator and job cluster works fine in Azure AKS but the bellow error occurs on AWS EKS.

Error from server (InternalError): error when creating "flink-on-k8s-operator-flink-operator-0.2.0/helm-chart/flink-job-cluster/flink-job-cluster.yaml": Internal error occurred: failed calling webhook "mflinkcluster.flinkoperator.k8s.io": Post https://flink-operator-webhook-service.flink-operator-system.svc:443/mutate-flinkoperator-k8s-io-v1beta1-flinkcluster?timeout=30s: context deadline exceeded

the job-cluster yaml:

apiVersion: flinkoperator.k8s.io/v1beta1
kind: FlinkCluster
metadata:
  name: flink-job-cluster
  labels:
    app: flink-job-cluster
    chart: flink-job-cluster-0.1.51
    release: flink-job-cluster
spec:
  image:
    name: "flink:1.9.3"
  envVars:
    - name: HADOOP_CLASSPATH
      value: /opt/flink/opt/flink-metrics-prometheus-1.9.3.jar    
  jobManager:
    accessScope: Cluster
    ports:
      ui: 8081
    extraPorts:
      - containerPort: 9249
        name: prom    
    resources:
      limits:
        cpu: 200m
        memory: 1024Mi
    podAnnotations:
      fluentbit.io/parser: foo
  taskManager:
    replicas: 2
    extraPorts:
      - containerPort: 9249
        name: prom
        protocol: TCP  
    resources:
      limits:
        cpu: 200m
        memory: 1024Mi
    podAnnotations:
      fluentbit.io/parser: foo
  job:
    jarFile: ./examples/streaming/WordCount.jar
    className: org.apache.flink.streaming.examples.wordcount.WordCount
    args: ["--input", "./README.txt"]
    parallelism: 
    restartPolicy: Never 
    podAnnotations:
      fluentbit.io/parser: foo
  flinkProperties:
    metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
    taskmanager.numberOfTaskSlots: "1"

Using flink-operator 0.2.0 and EKS 1.17

swagy-tarun commented 3 years ago

Hi, Did you find a resolution to this issue?

emmanuelCarre commented 2 years ago

Hello,

Maybe this issue have same origin than #399. See https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/399#issuecomment-1206193790.