argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.13k stars 3.21k forks source link

workflow ignores `inputs.artifacts.s3.bucket` and uses the bucket from `artifactRepositoryRef.configMap` #12564

Open Anton-Sagurov opened 10 months ago

Anton-Sagurov commented 10 months ago

Pre-requisites

What happened/what did you expect to happen?

We are using ClusterWorkflowTemplate to download artifacts form S3 bucket, unzip and prepare for usage by another workflows. The S3 bucket where we store artifacts that we want to consume not the same that we use to store the logs and output artifacts of the workflows:

  1. argo-workflows-111111111111-eu-central-1 - Bucket where input artifacts we store (that we specify in workflowTemplate)
  2. argo-workflows-222222222222-eu-central-1 - Bucket that we use to store logs and outputs (It's specified in configMap): The artifactRepositoryRef.configmap:
    
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: workflow-artifacts
    data:
    columns: |-
    - name: ClusterID
      type: label
      key: cloud/cluster_id
    cloud-artifacts: |
    s3:
      endpoint: s3.amazonaws.com
      bucket: argo-workflows-222222222222-eu-central-1
      region: eu-central-1
      insecure: false
      keyFormat: "artifacts/wkf-support/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{workflow.name}}/{{pod.name}}"
      useSDKCreds: true
      encryptionOptions:
        enableEncryption: true

The init container tries to download the artifact from the wrong S3 bucket (argo-workflows-222222222222-eu-central-1):

time="2024-01-23T06:15:41.916Z" level=info msg="Getting file from s3" bucket=argo-workflows-222222222222-eu-central-1 endpoint=s3.amazonaws.com key=ansible/fetch-logs/latest/ansible.zip path=/argo/inputs/artifacts/archive.tmp


I think that this is a bug, because it contradicts to this workflow example: https://github.com/argoproj/argo-workflows/blob/main/examples/input-artifact-s3.yaml#L26

Logs from init container:

❯ kb logs -f -n wkf-support fetch-artifact-24g99 -c init time="2024-01-23T06:15:41.733Z" level=info msg="Starting Workflow Executor" version=v3.5.3 time="2024-01-23T06:15:41.744Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5 time="2024-01-23T06:15:41.744Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=wkf-support podName=fetch-artifact-24g99 templateName=main version="&Version{Version:v3.5.3,BuildDate:2024-01-11T02:24:40Z,GitCommit:0fdf74511d4671cf0c8c334aa2d90ecd61c5acce,GitTag:v3.5.3,GitTreeState:clean,GoVersion:go1.21.6,Compiler:gc,Platform:linux/amd64,}" time="2024-01-23T06:15:41.822Z" level=info msg="Loading script source to /argo/staging/script" time="2024-01-23T06:15:41.822Z" level=info msg="Start loading input artifacts..." time="2024-01-23T06:15:41.822Z" level=info msg="Downloading artifact: archive" time="2024-01-23T06:15:41.822Z" level=info msg="S3 Load path: /argo/inputs/artifacts/archive.tmp, key: ansible/fetch-logs/latest/ansible.zip" time="2024-01-23T06:15:41.833Z" level=info msg="Creating minio client using AWS SDK credentials" time="2024-01-23T06:15:41.916Z" level=info msg="Getting file from s3" bucket=argo-workflows-222222222222-eu-central-1 endpoint=s3.amazonaws.com key=ansible/fetch-logs/latest/ansible.zip path=/argo/inputs/artifacts/archive.tmp ---------START-HTTP--------- HEAD /ansible/fetch-logs/latest/ansible.zip HTTP/1.1 Host: argo-workflows-734708892259-eu-central-1.s3.dualstack.eu-central-1.amazonaws.com User-Agent: MinIO (linux; amd64) minio-go/v7.0.65 Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20240123/eu-central-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=REDACTED X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 X-Amz-Date: 20240123T061541Z X-Amz-Security-Token: IQoJb3JpZ2luX2VjeJ7//////////wEaDGV1LWNlbnRyYWwtMSJHMEUCIQCp8rJEqxAhtFjiHq9WAEAHgaMDvOjQg9J4MWPMbPJbKwIgKE1RGQe6I2BIbOx3vMm6kXrtbSIys+1KdH8YM9sWxDYqiAUIVxAEGgw3MzQ3MDg4OTIyNTkiDA19GzRV2KS7TLhbIyrlBErIKNJd/HrJH8Gc8O7ZHKZ9GRrgSM1+tT+JM1flX8VSsY5vnqqxkJEpcVNwzi7qtCDDrJLMFqrb5E/mS3Re8lwieSr1Z1b8yq2N9dr2onMhvKX2JPF/AZvgrOthoIbEgmFN9v/UUgYkfqQp+nN7ZAzEdz43bRTaybLFYZiWiep/Eo2YB5mtujRjCN6mNmMCwHMMfYDLWBk5qaxcd+fOjCaUVFZSskYvFkdW1nOujweyibzRNd4hWzIa4o3fbRgjC6zF9j7SRRT0mK5RbGEZKJ1mdAaMWkTjrBB/nrxzF9jbQJ8T5wtSvp6LVJTNL54juTbAe4FKLvc5VtsldAGcAGALvCwXX9XB0ie+Ovg1h/DmbH/ZIFYR37OeDTB/Q1GyBp9Mt1vLwRluNSkIOaCUAwR9jT8gRhSPiMpKqaIkij+F1pMn4pof41iBBVDdHvctOSHsEe5r/+zZCy6Mwu0D7fJHSDehmdhP9SI/k2ePogbSYcsbjiz7o5U7Y6GTA0ygzo0C+9t+hudzdJgjx4p2jJg6Y6Drk6+qKLkvD67YCylBbFLxir3IivnpwOWUmDzrE+nzSaqEsDTqNmhw9SR87H8/Q1wC4E3qKHwDeQlDMzyqacWxlQBL9Z2Ag8tYcazr0MnRV3B0QNloUNlPPIO2Xqe7B455df3RlmwK+4y85dbEvVNW2kfJF+XFkgC1Yr0DUvYI7vo1lxDtPyRryAa/bdmFGdgFp8BS0wJKnnbACU4+CB5K0SPghvXAfgRIkx8Esjxe6rrpWNDikW/r+Q0FsUaPJZwH39sA407852OKZRAscVvW3AYwjbO9rQY6mwF6kq3sJWXMMD1Sh3b5qV9a6Hi3ggKVlt4ISIwDp19OqSpxkVqXSLgaHXUnYHTVBI9pffvIq6CcKj1gl9mDqH0zU5z0I6pX6WCe1onIEghUaShSJk2QA6AMg4naDWxY7q/Ui1eT0VZ0/A/+8FIDtyCJPbYGVsIx/0zAs5h/brZtaxWeutGoquhe4OA+jz/wJvo9U3Qj7AWQXmDbeg==

HTTP/1.1 403 Forbidden Connection: close Content-Type: application/xml Date: Tue, 23 Jan 2024 06:15:41 GMT Server: AmazonS3 X-Amz-Id-2: nmBVncP50yChVfULmLa+zfl3yno5PouUeWKsdeM8oeL9n5PnVJruvfemHhMPuZi6QqME+rpitKQ= X-Amz-Request-Id: Z828D6DHZAKZKPDS ---------END-HTTP--------- time="2024-01-23T06:15:41.964Z" level=warning msg="Non-transient error: Access Denied." time="2024-01-23T06:15:41.964Z" level=info msg="Load artifact" artifactName=archive duration=142.248041ms error="failed to get file: Access Denied." key=ansible/fetch-logs/latest/ansible.zip time="2024-01-23T06:15:41.965Z" level=error msg="executor error: artifact archive failed to load: failed to get file: Access Denied." time="2024-01-23T06:15:41.965Z" level=info msg="Alloc=11272 TotalAlloc=16628 Sys=23397 NumGC=4 Goroutines=8"


### Version

3.5.3

### Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

```YAML
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  annotations:
  labels:
    workflows.argoproj.io/controller-instanceid: argoworkflow
  name: fetch-artifact 
spec:
  archiveLogs: true
  arguments: {}
  artifactGC:
    strategy: Never
  entrypoint: main
  parallelism: 25
  artifactRepositoryRef:
    configMap: workflow-artifacts
  podGC:
    strategy: OnWorkflowCompletion
  serviceAccountName: cwft-cloud
  templates:
  - name: main
    inputs:
      artifacts:
      - name: archive 
        path: /tmp/archive.zip
        s3:
          bucket: argo-workflows-1111111111-eu-central-1 
          key: ansible/fetch-logs/latest/ansible.zip
    script:
      command:
      - sh
      image: joshkeegan/zip:latest
      imagePullPolicy: Always
      name: unbox-archive 
      resources: {}
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
          - ALL
        readOnlyRootFilesystem: false
        runAsNonRoot: true
        runAsUser: 1000
      source: |
        # show input artifacts
        ls -al /tmp/archive.zip
        cd /tmp
        unzip ./archive.zip
        ls -al ./

  ttlStrategy:
    secondsAfterCompletion: 43200
    secondsAfterFailure: 302400
    secondsAfterSuccess: 43200
  workflowMetadata:
    labels:
      workflows.argoproj.io/archive-strategy: always
      workflows.argoproj.io/controller-instanceid: argoworkflow

Logs from the workflow controller

❯ kubectl logs -n argoworkflow deploy/workflow-controller | grep fetch-artifact-24g99
time="2024-01-23T07:13:41.880Z" level=info msg="Processing workflow" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.881Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.881Z" level=info msg="Pod failed: Error (exit code 1): artifact archive failed to load: failed to get file: Access Denied." displayName=fetch-artifact-24g99 namespace=wkf-support pod=fetch-artifact-24g99 templateName=main workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.881Z" level=info msg="marking node as failed since init container has non-zero exit code" namespace=wkf-support new.phase=Failed workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.881Z" level=info msg="node unchanged" namespace=wkf-support nodeID=fetch-artifact-24g99 workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.881Z" level=info msg="workflow suspended" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:13:41.941Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=172367813 workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.444Z" level=info msg="Processing workflow" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.445Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.445Z" level=info msg="Pod failed: Error (exit code 1): artifact archive failed to load: failed to get file: Access Denied." displayName=fetch-artifact-24g99 namespace=wkf-support pod=fetch-artifact-24g99 templateName=main workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.445Z" level=info msg="marking node as failed since init container has non-zero exit code" namespace=wkf-support new.phase=Failed workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.445Z" level=info msg="node unchanged" namespace=wkf-support nodeID=fetch-artifact-24g99 workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.445Z" level=info msg="workflow suspended" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:13:51.459Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=172367813 workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.444Z" level=info msg="Processing workflow" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.445Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.445Z" level=info msg="Pod failed: Error (exit code 1): artifact archive failed to load: failed to get file: Access Denied." displayName=fetch-artifact-24g99 namespace=wkf-support pod=fetch-artifact-24g99 templateName=main workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.445Z" level=info msg="marking node as failed since init container has non-zero exit code" namespace=wkf-support new.phase=Failed workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.445Z" level=info msg="node unchanged" namespace=wkf-support nodeID=fetch-artifact-24g99 workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.445Z" level=info msg="workflow suspended" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:33:51.470Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=172367813 workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.445Z" level=info msg="Processing workflow" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.446Z" level=info msg="Task-result reconciliation" namespace=wkf-support numObjs=0 workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.446Z" level=info msg="Pod failed: Error (exit code 1): artifact archive failed to load: failed to get file: Access Denied." displayName=fetch-artifact-24g99 namespace=wkf-support pod=fetch-artifact-24g99 templateName=main workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.446Z" level=info msg="marking node as failed since init container has non-zero exit code" namespace=wkf-support new.phase=Failed workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.446Z" level=info msg="node unchanged" namespace=wkf-support nodeID=fetch-artifact-24g99 workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.446Z" level=info msg="workflow suspended" namespace=wkf-support workflow=fetch-artifact-24g99
time="2024-01-23T07:53:51.458Z" level=info msg="Workflow update successful" namespace=wkf-support phase=Running resourceVersion=172367813 workflow=fetch-artifact-24g99

Logs from in your workflow's wait container

❯ kubectl logs -n wkf-support -c wait -l workflows.argoproj.io/workflow=fetch-artifact-24g99,workflow.argoproj.io/phase!=Succeeded
Error from server (BadRequest): container "wait" in pod "fetch-artifact-24g99" is waiting to start: PodInitializing
Anton-Sagurov commented 10 months ago

This makes workflow working as it's Key-Only-Artifacts, while in fact - the S3 bucket is specified.

ljyanesm commented 9 months ago

@Anton-Sagurov

I think if you add the endpoint, it works ok.

        s3:
          endpoint: s3.amazonaws.com
          bucket: argo-workflows-1111111111-eu-central-1 
          key: ansible/fetch-logs/latest/ansible.zip
ljyanesm commented 7 months ago

@agilgur5

We ran into this same issue, what was posted above was the solution. It was not trivial to understand that the missing endpoint was the cause for the unexpected bucket.

I can think of two options for addressing this issue:

I believe a combination of the two is the better outcome, some clear docs on how to handle s3 artifacts including troubleshooting the different endpoint/bucket problem above. And, if there's a default for the endpoint in the configmap, allowing the bucket to be changed without specifying the endpoint.

tooptoop4 commented 3 weeks ago

although for inputs not archive logs this is similar to https://github.com/argoproj/argo-workflows/issues/12727 in that the configmap is taking precedence