Open alexec opened 4 years ago
I think we can use the Artifact drivers to store the caching metadata.
We can store the cache entry as an artifact using the same artifact location configuration. For example, in s3://<some_bucket>/artifacts/<cache_key>/cache_entries.yaml
.
P.S. There are some benefits to allow multiple entries for the same cache_key, because even with exact same inputs, a volatile component can produce different results and in some scenarios all of them should be cached.
Interesting idea. We just need storage and this is a good option.
This is required for caching large outputs because etcd places a limit on the maximum size of a configmap. Piggy-backing on the artifact storage sounds like it should be feasible to me.
Hi, Is anybody working on this issue? I'm interested in working on this. Could I take it forward?
I'm interested in this. I want to try it
I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?
I would like to contribute to this project for GSOC 2022. Can you please give me some more details on this?
Here is the current memoization implementation document https://github.com/argoproj/argo-workflows/blob/master/docs/memoization.md
Okay. Thank You .
If you'd like to do this as part of GSoC, you'll need to sign up here:
https://summerofcode.withgoogle.com
GSoC does not start for several months, so if you're instead looking to make impact today, and don't need the benefits of GSoC (see their website for the details), then mentoring might the right approach for you.
@alexec The applications for participants will open in April right? I'm actually a bit new to all of this.
@a-muk Please take a look at the links available in https://github.com/argoproj/argo-workflows/blob/master/docs/mentoring.md#how-to-participate-google-summer-of-code
Thank you @terrytangyuan
How can I apply for this idea for GSOC? is there any communication channel with the mentors?
@alexec Hey, can you please help me understand how should I go into mentoring? I've been working with Argo-workflows for 1 year.
Summary
Memoization is a feature that allows users to run workflows faster by avoiding repeating work that has already been done.
Currently memoization uses a Kubernetes config map for storage. This will not scale to large number of entries, it requires elevated RBAC. Instead, we should provide the option to use a alternative database to store these in.
Motivation
Large workflows.
Proposal
Options:
See #944
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.