GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
659 stars 265 forks source link

How to enforce SSL/TLS everywhere through this operator? #309

Open a-roberts opened 4 years ago

a-roberts commented 4 years ago

Hey everyone, I've been trying this operator successfully on OpenShift after making a few small changes and applying a workaround https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/288 in to use Flink 1.11.

Now I'd like to check that I can use SSL/TLS everywhere as per https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html. I had a look through https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/3352bf51c0d3167ba87a626cf5d6ef37753b8c57/docs/crd_v1alpha1.md and I noticed there's useTLS for the Ingress endpoint (I assume for external access, so perhaps securing the Flink UI?) but I don't see anything for internal communications.

Is it possible to achieve this through the operator and if so, how? I don't see it is as a supported feature on the main readme but I am thinking it would be done through an override in here for the FlinkCluster CR

spec:
  flinkProperties:

I'm wondering if anyone's done this before, I'll have a try anyway and see what happens, but couldn't find any documentation on this for the operator itself (lemme know if I've missed something please) and hence my curiosity in the event it's something not yet available.

Thanks!

Update, you can do it - make the keystore/truststore etc upfront first and then create a secret + mount it in. I don't care for any of these values being known (just testing on my laptop)

kind: FlinkCluster
metadata:
  name: tls-flink-cluster-1-11
spec:
  jobManager:
    volumeMounts:
      - name: flink-secret-volume
        mountPath: /etc/flink-secrets
    volumes:
    - name: flink-secret-volume
      secret:
        secretName: flink-tls-secret
    accessScope: Cluster
    resources:
      limits:
        memory: 600Mi
        cpu: "1.0"
  taskManager:
    volumeMounts:
      - name: flink-secret-volume
        mountPath: /etc/flink-secrets
    volumes:
    - name: flink-secret-volume
      secret:
        secretName: flink-tls-secret
    replicas: 1
    resources:
      limits:
        memory: 1Gi
        cpu: "1.0"
  image:
    name: flink:scala_2.12-java8
    # https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html is helpful for this part.
    web.submit.enable: "false"
    taskmanager.numberOfTaskSlots: "1"
    jobmanager.heap.size: ""                # set empty value (only for Flink version 1.11 or above)
    jobmanager.memory.process.size: 1gb   # job manager memory limit  (only for Flink version 1.11 or above)
    taskmanager.heap.size: ""               # set empty value
    taskmanager.memory.process.size: 1gb    # task manager memory limit
    security.ssl.internal.enabled: "true"
    security.ssl.internal.keystore: /etc/flink-secrets/internal-keystore.p12
    security.ssl.internal.truststore: /etc/flink-secrets/internal-keystore.p12
    security.ssl.internal.keystore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
    security.ssl.internal.truststore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
    security.ssl.internal.key-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
    security.ssl.rest.enabled: "true"
    security.ssl.rest.keystore: /etc/flink-secrets/rest-keystore.p12
    security.ssl.rest.truststore: /etc/flink-secrets/ca-truststore.p12
    security.ssl.rest.keystore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
    security.ssl.rest.truststore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
    security.ssl.rest.key-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password

I made the files upfront and have them in a secret with the following format:

apiVersion: v1
  kind: Secret
  type: Opaque
  metadata:
    name: flink-tls-secret
  data:
    ca-keystore.p12: $(cat ./certs/ca-keystore.p12 | base64 | tr -d '\n')
    ca-truststore.p12: $(cat ./certs/ca-truststore.p12 | base64 | tr -d '\n')
    internal-keystore.p12: $(cat ./certs/internal-keystore.p12 | base64 | tr -d '\n')
    rest-keystore.p12: $(cat ./certs/rest-keystore.p12 | base64 | tr -d '\n')
    store-password.txt: $(cat ./certs/store-password.txt | base64 | tr -d '\n')
functicons commented 4 years ago

Thanks for your question! The operator has no first class support for SSL/TLS. If you have successfully configured it through flinkProperties, it would be nice if you can share your experience by adding a section to the user guide. Thank you!

a-roberts commented 4 years ago

Thanks @functicons, good to know! I'll be happy to share what myself and a colleague at IBM have at the moment, currently trying to submit a job to the Job Manager (by port-forwarding and doing a normal flink run) and seeing problems though, so while it may be secure* it's not so useful yet without good docs

a-roberts commented 4 years ago

@SparkX120, a colleague at IBM, has suggested this would be an improvement to the CR instead of needing to specify all of the options as well, worth mentioning here I think:

FlinkCluster:
 metadata:
  name: my-cluster
 spec:
  ha:
   enabled: true
  tls:
   enabled: true
   existingTlsSecret: secret-name
  taskManager:
   replicas: 4
   memory: 4Gi
   taskSlots: 8
chrispatmore commented 3 years ago

Just in terms of simplicity. I find specifying something as enabled as

tls: {}

rather than

tls:
  enabled: true

Is cleaner and easier to add. Then if someone wants access to additional configuration they open up the section. So

  tls:
   existingTlsSecret: secret-name
   renewal:
   dnsNames:
   etc..

The same can be done with ha.

Additionally, when referencing a secret, there is a Kubernetes standard. So

tlsSecret:
  secretName: my-secret

or

tlsKey:
  secretName: my-secret
  key: tls.key
tlsCert:
  secretName: my-secret
  key: tls.crt
shashken commented 3 years ago

You don't always want to work with kubernetes secrets for this, you can use a vault for the certificate/passphrase. It can be achieved using either:

and the needed flink configuration.

I think issue #383 is important, but I'm not sure the "tls" config is required here, maybe an example for a cluster with SSL can be enough.
Either way I think its important to keep the possibility to support all possible certificate gathering solutions. What do you think? @a-roberts @chrispatmore @EnriqueL8 @functicons

a-roberts commented 3 years ago

You don't always want to work with kubernetes secrets for this, you can use a vault for the certificate/passphrase. It can be achieved using either:

and the needed flink configuration.

I think issue #383 is important, but I'm not sure the "tls" config is required here, maybe an example for a cluster with SSL can be enough. Either way I think its important to keep the possibility to support all possible certificate gathering solutions. What do you think? @a-roberts @chrispatmore @EnriqueL8 @functicons

Great feedback and suggestions, so...

init container (that downloads the certificate from the vault)

I actually tried this approach, but the environment I am working in is OpenShift with OLM, and the way I coded this init container approach caused problems (since I was modifying things at runtime for my own deployment spec, managed by OLM)

I've updated one of my posts above just to mention what I eventually got working - making the secret upfront and mounting it in through a volume, with the needed Flink configuration.

Either way I think its important to keep the possibility to support all possible certificate gathering solutions.

Absolutely, having a maintained set of examples, using our existing CR definitions, would be really helpful.

chrispatmore commented 3 years ago

I agree, supporting multiple ways of configuring TLS is important. One such way can be configuring to use and work with https://cert-manager.io/ which is becoming a popular way of managing certificates in Kubernetes. But it is by no means the only or necessarily always the "best" way.

For me what would be nice is first class support for enabling and configuring TLS in a user friendly way. such that for example I could specify tls: {} and have TLS turned on everywhere with self signed certificates. Or I could expand that section and specify where / how the cluster should retrieve its certificates

guruprasathT commented 3 years ago

@functicons

Thanks for your question! The operator has no first class support for SSL/TLS. If you have successfully configured it through flinkProperties, it would be nice if you can share your experience by adding a section to the user guide. Thank you!

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/controllers/flinkclient/http_client.go

https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/0310df76d6e2128cd5d2bc51fae4e842d370c463/controllers/flinkcluster_submit_job_script.go#L61

This is important because any k8s user knowing the clusterIP will be able to submit the job from any other container within the same k8s cluster namespace, even though we could suppress using ingress authentication for the flink webUI. Do you have any suggestion for this?