Summary

Cached step/task output (parameter or artifacts) can be referred to in another workflow to avoid the same step execution which will save time and resources.

Similar issue #944

Motivation

In ETL and ML use cases, Some steps/tasks in all workflow will be the same output if the same input is passed. If Argo has the ability to cache the output for those steps, it can be referred to in another workflow. The cached steps/task execution will be skipped and just used the cached output

Proposal

The template will have a flag for cachable.

- name: gen-number-list
 cachable: true
script:
  image: python:alpine3.6
  command: [python]
  source: |
    import json
    import sys
    json.dump([i for i in range(20, 31)], sys.stdout)

Create new CRD which will hold the node status of the latest succeed template

apiVersion: argoproj.io/v1alpha1
kind: CachedNodeStatus
metadata:
  Name:  retry-to-completion
  Namespace: argo
  labels:
    lastExecution: 02/04/2020  19:30
spec:
  boundaryID: steps-6c4tm
  displayName: hello1
  finishedAt: "2020-02-04T06:22:28Z"
  id: steps-6c4tm-1651667224
  inputs:
  parameters:
  - name: message
    value: hello1
  message: 'failed to save outputs: Failed to establish pod watch: unknown (get
  pods)'
  name: steps-6c4tm[0].hello1
  phase: Error
  startedAt: "2020-02-04T06:22:09Z"
  templateName: whalesay
  type: Pod

cache reference:

- name: gen-number-list
fatchFromCache: true
script:
  image: python:alpine3.6
  command: [python]
  source: |
    import json
    import sys
    json.dump([i for i in range(20, 31)], sys.stdout)

I agree it's not a bad idea but for Argo you are responsible for the data flow! Which means you copy the results of the step into S3 and let all depending steps copy the data back. I use Amazon EKS and it's clearly restricted to "WriteReadOnce" volumes (EBS) which means a volume can be mounted to one node only.

What could be possible technically is to have separation and aggregation of artifacts. So separation would mean i copy data from one volume to many other volumes and aggregation means I copy from many volumes to one. This would allow a single step producing results that are processed in parallel by the next step without the need of using S3 buckets in between. With EKS there is still the restriction, that one EBS volume is restricted to it's Availiblity Zone (AZ) which is a problem for aggregation when the volumes have been created in different AZs.

argoproj / argo-workflows

Supporting Step/Task output caching and refer in another workflows #2157

Summary

Motivation

Proposal