argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.9k stars 3.18k forks source link

Configure artifacts using URIs, e.g `s3://my-bucket/my-key` #4349

Open alexec opened 3 years ago

alexec commented 3 years ago

Summary

It might be convenient and more compact to represent artifacts by URIs. A bit like go getter, but with the option to put as well as get.

Use Cases

Add more repositories.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec commented 3 years ago

We'd also need:

How do you do secrets?

Ark-kun commented 3 years ago

go getter,

Does go-getter support canonical storage URIs? E.g. gs:// s3:// http:// etc.

How do you do secrets?

Maybe we can have a configmap that can specify authentication for all URI schemes. The confimap can be per-cluster or specified per-workflow (like ArtifactRepositoryRef)

In the future we could also make it possible to specify auth/secrets per-bucket.

P.S. I do not have a big stake in this. The scenarios that could be useful to me is:

name: downstream
inputs:
  parameters:
  - name: input1-uri
  artifacts:
  - name: input1
    path: ...
    uri: {{inputs.parameters.input1-uri}}
name: dag-task-1
template: downstream
arguments:
  parameters:
  - name: input1-uri
     valueFrom: {{tasks.upstream.outputs.parameters.some-uri}}
steve-marmalade commented 3 years ago

I have an argo step that generates a list of GCS URIs of the form gs://my-bucket/path/to/file that I'd like to map over. It would be convenient to pass these URIs directly to the gcs artifact handler, rather than having to post-process them into bucket and key. I found this issue while searching to see whether this is supported, and figured I'd post my use case in the hope that it's helpful. Thanks!

alexec commented 2 years ago

I'm planning on doing some work in the next few that would be made much easier with URNs for artifacts. To be able to load/save artifacts we'll need to be able to:

Lets look at some URNs:

Repo Example Notes
Artifactory URL
Git https://github.com/argoproj/argo-workflows.git or git@github.com:argoproj/argo-workflows.git SSH and HTTPS URLs are different
HDFS hdfs://hadoopNS/data/users.csv
HTTP URL
GCS //www.googleapis.com/storage/v1/bucket/foo.zip
OSS https://BucketName.Endpoint/Object?SignatureParameters
Raw base64(value) Not sure how to represent this for large values.
S3 https://s3-eu-west-1.amazonaws.com/bucket/foo There are several variants
alexec commented 2 years ago

Currently, you need the workflow to find the artifact. We could restrict this workflows using artifactRepositioryRef (i.e. a named configmap).

Auth URN:

artifactRepositoryRef:v1:{namespace}:{name}:{key}
workflow:v1:{namespace}:{name}:{nodeID}:input:{artifactName}
workflow:v1:{namespace}:{name}:{nodeID}:output:{artifactName}

Artifact URN:

artifact:v1:s3:{endpoint}:{bucket}:{key}
artifact:v1:git:{repo}:{branch}

Dataflow gets it wrong, secrets should not go in the URN:

https://github.com/argoproj-labs/argo-dataflow/blob/11251f5806ed8bfbd9e6b5017259c19b006118ca/docs/META.md