GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
657 stars 265 forks source link

savepointDir only supported gs://? we want to use s3:// #395

Open kaohaonan6666 opened 3 years ago

kaohaonan6666 commented 3 years ago

how to use s3:// as our savepointDir how to solve the problem

guanjieshen commented 3 years ago

Also wondering if wasb:// would be supported as well?

youngwookim commented 3 years ago

@kaohaonan6666 You can write the savepoints into s3 bucket via s3a:// prefix like following:

  job:
    jarFile: /opt/flink-job.jar
    savepointsDir: s3a://mybucket/flink/savepoints
    autoSavepointSeconds: 360

(snip)

  flinkProperties:
    # for s3 access
    s3.access-key: "YOUR-ACCESS-KEY"
    s3.secret-key: "YOUR-SECRET-KEY

And also, you should make sure that your docker image contains the jars to access the s3 buckets. see https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/images/flink/docker/Dockerfile

I've revised Dockerfile to include jars in order to access aws s3, like following:

(snip)

# s3
ARG FLINK_S3_HADOOP_JAR_NAME=flink-s3-fs-hadoop-1.11.2.jar
ARG FLINK_S3_HADOOP_JAR_URI=https://repo1.maven.org/maven2/org/apache/flink/flink-s3-fs-hadoop/1.11.2/${FLINK_S3_HADOOP_JAR_NAME}

RUN echo "Downloading ${FLINK_S3_HADOOP_JAR_URI}" && \
  wget -q -O /opt/flink/lib/${FLINK_S3_HADOOP_JAR_NAME} ${FLINK_S3_HADOOP_JAR_URI}

Hope this helps.

kaohaonan6666 commented 3 years ago

we want to start job with a savepoint rather than do a savepoint, we check that fromSavepoint can help us, but it only supports gs://

youngwookim commented 3 years ago

@kaohaonan6666 IMO, Flink savepoint is a path to save snapshot images on HCFS. I believe, if gs:// does work, then others like s3a:// or hdfs:// should work too.

kaohaonan6666 commented 3 years ago

we have checked s3:// hdfs:// but no works ,just want to make sure can use it or not

youngwookim commented 3 years ago

@kaohaonan6666 I'm not sure that is a bug on flink-operator but Basically, Flink supports well-known object storage for storing savepoints, checkpoints and etc. ref., https://ci.apache.org/projects/flink/flink-docs-stable/deployment/filesystems/ So, you should double check required libraries and configurations for particular docker image for the fs scheme. That means, default docker image for flink-operator is works fine with GCS but if you want to use aws or azure, you should customize your docker image and flink properties respectively.

shashken commented 3 years ago

This is not related to the operator.

kaohaonan6666 commented 3 years ago

we solve the problem the crd is fromSavepoint,but the doc is fromSavePoint we check the srouce code , use 'p' instead of 'P',then success, just a slip!