elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.85k stars 343 forks source link

Generic node executions are not cached by Kubeflow Pipelines #2717

Open ptitzler opened 2 years ago

ptitzler commented 2 years ago

Describe the issue While investigating potential options how to support partial pipeline execution, I've noticed that nodes that are implemented using generic components are not cached by Kubeflow Pipelines. This is an unintentional side effect of a unique environment variable value that we are passing to the container. In a nutshell this variable contains a unique run id, which is a constant value across all nodes in the same pipeline.

Caching can have significant benefits with respect to resource usage and performance. Imagine, if you will, a notebook that downloads a data set archive, extracts it, and performs some processing. If the archive content doesn't change at the source, downloading and processing it again is completely unnecessary if the produced outputs are identical during multiple runs.

I've created this issue to discus how we should deal with caching in general:

ptitzler commented 2 years ago

Did some more digging but haven't been able to identify a straightforward solution that would support providing a unique run id without loosing the benefits of node output caching. Since there have not been any user reports that the lack of caching support for generic components poses an issue, no action will be taken at this time.

akfmdl commented 1 year ago

I have same issue!

akfmdl commented 1 year ago

i think this issues occurs when using kubeflow pipeline editor