getindata / kedro-kubeflow

Kedro Plugin to support running workflows on Kubeflow Pipelines
https://kedro-kubeflow.readthedocs.io
Apache License 2.0
46 stars 21 forks source link

Storage: kubeflow kubernetes pvc (persistent volumes) and mounted storage in Azure #215

Open cpereir1 opened 1 year ago

cpereir1 commented 1 year ago

Hi, I have a Azure deployment of Kubeflow, in a Kubernetes cluster. I would like to use Kedro, and kedro kubeflow to abstract from the difficulty of creating kubeflow pipelines. I am wondering a few things:

Thank you so much!

marrrcin commented 1 year ago

Hi @cpereir1 , sorry for a late response:

Hoes kedro, or kedro kubeflow, interact with, for example, a locally downloaded dataset that kubeflow mounted into a local volume?

You can just read it from disk. Our plugin also mounts data volume under /home/kedro/datavolume as defined here: https://github.com/getindata/kedro-kubeflow/blob/ab709867507bbe61ee6ae52a9a95986173a344dd/kedro_kubeflow/generators/pod_per_node_pipeline_generator.py#L153

How can a kedro node use GPUs to run, for example, a training operation?

See https://github.com/getindata/kedro-kubeflow/pull/202 - we're going to release it soon. You can build the plugin from develop branch to use it right away.

How can a kedro node execute, for example, a training pipeline, that consists of but multiple nodes and dependencies within it?

Node dependencies from Kedro are automatically translated to dependencies between nodes in KFP.

Where/how does kedro kubeflow generate the kubeflow pipeline.yaml necessary by Kubeflow to execute pipelines?

It's generated in the directory from which you're running your kedro pipeline. You can check out kedro kubeflow compile command.