apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.64k stars 597 forks source link

Add support for Persistent Volumes for stateful storage #3723

Closed nicknezis closed 2 years ago

nicknezis commented 3 years ago

We would like to add a set of submit parameters that allow for specifying PersistentVolumeClaims and mount points similar to the feature found in Spark (described here: https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes)

surahman commented 3 years ago

This should be rather straightforward to implement for the CLI. We can use KubernetesController.getConfigItemsByPrefix to collect all the relevant options from the config. As per our discussions, we should be using dynamic provisioning?

Suggested commands are below based on the Spark example, anything else we should add?

--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.claimName=OnDemand
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.storageClass=gp
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].options.sizeLimit=500Gi
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].mount.path=/data
--config-property heron.kubernetes.persistentVolumeClaim.[volume name].mount.readOnly=false

I can start work on this but I can only take it as far as where it needs to be wired into #3710. I will then need to rebase onto that PR and wire it in. I will need also to clean up the test suite and perform some other general merge-conflict like clean-up operations at that point.

surahman commented 3 years ago

An idea that I had is a workflow where users put all their K8s configs, including the pod template, into a directory and then load them into a ConfigMap. The configs users wish to have loaded into the containers is then provided using --config-property.

surahman commented 3 years ago

Looking through the Spark documentation there seem to be the following options supported:

There is a multitude of options available on the K8s API.

surahman commented 3 years ago

I have the PVC assembly part of the PR completed and I am now working on wiring all this up to make sure it works correctly with custom Pod Templates.

Commands:

--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.claimName=nameOfVolumeClaim
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.storageClassName=storageClassNameOfChoice
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.accessModes=comma,separated,list
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.sizeLimit=555Gi
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.volumeMode=volumeModeOfChoice
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.path=path/to/mount
--config-property heron.kubernetes.volumes.persistentVolumeClaim.volumeNameOfChoice.subPath=sub/path/to/mount

Will generate the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nameOfVolumeClaim
spec:
  volumeName: volumeNameOfChoice
  accessModes:
    - comma
    - separated
    - list
  volumeMode: volumeModeOfChoice
  resources:
    requests:
      storage: 555Gi
  storageClassName: storageClassNameOfChoice

Entries will be made in the Pod for a Volume and in the executor container for the VolumeMount with the path as well as the subPath, as required.

The commands above are all that I have added for now but the code is designed so that you can easily add an enum for the PVC property. You would then need to add an entry to the switch statement which adds it to the actual PVC. This should make things more maintainable and significantly more extensible.