argoproj / argo-workflows

Workflow Engine for Kubernetes

https://argo-workflows.readthedocs.io/

Apache License 2.0

14.85k stars 3.17k forks source link

Memoization Storage #3587

Open alexec opened 4 years ago

alexec commented 4 years ago

Summary

Memoization is a feature that allows users to run workflows faster by avoiding repeating work that has already been done.

Currently memoization uses a Kubernetes config map for storage. This will not scale to large number of entries, it requires elevated RBAC. Instead, we should provide the option to use a alternative database to store these in.

Motivation

Large workflows.

Proposal

Options:

Use the database.
Use any artifact storage.

See #944

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

Ark-kun commented 3 years ago

I think we can use the Artifact drivers to store the caching metadata. We can store the cache entry as an artifact using the same artifact location configuration. For example, in s3://<some_bucket>/artifacts/<cache_key>/cache_entries.yaml. P.S. There are some benefits to allow multiple entries for the same cache_key, because even with exact same inputs, a volatile component can produce different results and in some scenarios all of them should be cached.

alexec commented 3 years ago

Interesting idea. We just need storage and this is a good option.

mkjpryor-stfc commented 3 years ago

This is required for caching large outputs because etcd places a limit on the maximum size of a configmap. Piggy-backing on the artifact storage sounds like it should be feasible to me.

lowc1012 commented 3 years ago

Hi, Is anybody working on this issue? I'm interested in working on this. Could I take it forward?

leonharetd commented 2 years ago

I'm interested in this. I want to try it

attreyee-muk commented 2 years ago

I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?

sarabala1979 commented 2 years ago

I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?

Here is the current memoization implementation document https://github.com/argoproj/argo-workflows/blob/master/docs/memoization.md

attreyee-muk commented 2 years ago

Okay. Thank You .

alexec commented 2 years ago

If you'd like to do this as part of GSoC, you'll need to sign up here:

https://summerofcode.withgoogle.com

GSoC does not start for several months, so if you're instead looking to make impact today, and don't need the benefits of GSoC (see their website for the details), then mentoring might the right approach for you.

attreyee-muk commented 2 years ago

@alexec The applications for participants will open in April right? I'm actually a bit new to all of this.

terrytangyuan commented 2 years ago

@a-muk Please take a look at the links available in https://github.com/argoproj/argo-workflows/blob/master/docs/mentoring.md#how-to-participate-google-summer-of-code

attreyee-muk commented 2 years ago

Thank you @terrytangyuan

Mostafa-wael commented 2 years ago

How can I apply for this idea for GSOC? is there any communication channel with the mentors?

sudhanshu456 commented 2 years ago

@alexec Hey, can you please help me understand how should I go into mentoring? I've been working with Argo-workflows for 1 year.