Open vicaire opened 6 years ago
/cc @jlewi
If I understand your question correctly, the sidecar ensures that the specified files are stored to a specific location in the artifact repository, and that specific files are fetched to a specific location in the container. Without a sidecar it would not be possible to do this as configuration. It would be up to the step logic to do this.
For example, if step 1 wires a.csv
, b.csv
, and tmp.csv
to /output/
in the container, we may only want a.csv
and b.csv
stored as artifacts. Step 2 may only require b.csv
. Furthermore, the step 2 container may expect the input file to be named input.csv
so a rename is required. The sidecar does this without requiring the step to perform that logic.
It would also be possible for steps to modify/delete the artifact of another step. That removes what I believe to be a key feature of any workflow/pipeline manager, which is data provenance.
It would also be possible for steps to modify/delete the artifact of another step.
The inputs volume can be mounted in read-only mode.
the sidecar ensures that the specified files are stored to a specific location in the artifact repository, and that specific files are fetched to a specific location in the container. Without a sidecar it would not be possible to do this as configuration.
You can mount any inputs/outputs volume subpath to any container location.
E.g. for task3
that uses artifacts from task1
and task2
:
Mount <repository>/workflow1/task1/outputs/output1/
to /io/inputs/input1/
in read-only mode
Mount <repository>/workflow1/task2/outputs/output1/
to /io/inputs/input2/
in read-only mode
Mount <repository>/workflow1/task3/outputs/output1/
to /io/outputs/output2/
for writing
container may expect the input file to be named input.csv
Ideally, containers should only use paths received from the command line arguments.
@wookasz
Let me reformulate a bit. Instead of the artifact repository being GCS/S3/MinioServer, would it be possible to have an option to store the data in a volume?
Given the large number of volume implementations (NFS, GCP Cloud Filer, etc.), it seems that this would support a large number of use cases beyond object stores.
@IronPan @paveldournov
@hongye-sun
FEATURE REQUEST: Volumes Instead of Sidecars to upload/download data to the default Artifact Repository
Hi, I was wondering why Argo decided to use a sidecar to download/upload data to GCS/S3/etc when using the Default Artifact Repository.
Did we consider using the Volume abstraction in Kubernetes? It looks that there are types of volumes for many kinds of storage and that it would make it easy to add a new kind of storage for the Default Artifact Repository by implementing a new kind of volume.
https://ai.intel.com/kubernetes-volume-controller-kvc-data-management-tailored-for-machine-learning-workloads-in-kubernetes/
https://kubernetes.io/docs/concepts/storage/volumes/