awslabs / kubeflow-manifests

KubeFlow on AWS
https://awslabs.github.io/kubeflow-manifests/
Apache License 2.0
168 stars 122 forks source link

Access denied to S3 bucket with s3-bucket-ssl-requests-only bucket policy #225

Closed psulowsk closed 2 years ago

psulowsk commented 2 years ago

Describe the bug When I add s3-bucket-ssl-requests-only bucket policy to S3 bucket, Kubeflow pods lose accesses to S3 bucket and the Access Denied message is returned.

Steps To Reproduce

  1. Install Kubeflow with S3 bucket.
  2. Create any pipeline that retrieves data from S3 bucket.
  3. Add the following S3 bucket policy:
    {
    "Id": "ExamplePolicy",
    "Version": "2012-10-17",
    "Statement": [
    {
      "Sid": "AllowSSLRequestsOnly",
      "Action": "s3:*",
      "Effect": "Deny",
      "Resource": [
        "arn:aws:s3:::my_bucket_name",
        "arn:aws:s3:::my_bucket_name/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      },
      "Principal": "*"
    }
    ]
    }
  4. Try to run this pipeline.

Expected behavior Kubeflow should successfully retrieve data from S3 bucket.

Screenshots image

goswamig commented 2 years ago

@psulowsk thanks for reporting this. Can you share the details on which kubeflow pods received the access deny error ?

psulowsk commented 2 years ago

Details for the pod that failed:

kind: Pod
apiVersion: v1
metadata:
  name: titanic-model-mxsrr-2738330753
  namespace: kubeflow-user-example-com
  selfLink: >-
    /api/v1/namespaces/kubeflow-user-example-com/pods/titanic-model-mxsrr-2738330753
  uid: 9d163e1b-5d42-4e3b-b927-d802519615b7
  resourceVersion: '39478'
  creationTimestamp: '2022-05-17T12:44:29Z'
  labels:
    pipeline/runid: 21ac1a32-e3c4-4aa8-9fd5-975bb71d8782
    pipelines.kubeflow.org/cache_enabled: 'true'
    pipelines.kubeflow.org/cache_id: ''
    pipelines.kubeflow.org/enable_caching: 'true'
    pipelines.kubeflow.org/kfp_sdk_version: 1.8.11
    pipelines.kubeflow.org/metadata_context_id: '1'
    pipelines.kubeflow.org/metadata_execution_id: '1'
    pipelines.kubeflow.org/metadata_written: 'true'
    pipelines.kubeflow.org/pipeline-sdk-type: kfp
    workflows.argoproj.io/completed: 'true'
    workflows.argoproj.io/workflow: titanic-model-mxsrr
  annotations:
    kubernetes.io/psp: eks.privileged
    pipelines.kubeflow.org/component_ref: '{}'
    pipelines.kubeflow.org/component_spec: >-
      {"implementation": {"container": {"args": ["--output-train",
      {"outputPath": "output_train"}, "--output-test", {"outputPath":
      "output_test"}], "command": ["sh", "-c", "(PIP_DISABLE_PIP_VERSION_CHECK=1
      python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas'
      || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
      --no-warn-script-location 'boto3' 'pandas' --user) && \"$0\" \"$@\"",
      "sh", "-ec", "program_path=$(mktemp)\nprintf \"%s\" \"$0\" >
      \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n", "def
      _make_parent_dirs_and_return_path(file_path: str):\n    import os\n   
      os.makedirs(os.path.dirname(file_path), exist_ok=True)\n    return
      file_path\n\ndef read_csv(output_train,\n             output_test):  
      \n    import boto3\n    import pandas as pd\n\n    # Set up
      connection\n    s3_client = boto3.client('s3')\n    bucket_name =
      \"kubeflow-bucket\" \n    train_file =
      \"titanic_input_data/train.csv\"\n    test_file =
      \"titanic_input_data/test.csv\"\n\n    # Download train file\n    response
      = s3_client.get_object(Bucket=bucket_name, Key=train_file)\n    status =
      response.get(\"ResponseMetadata\", {}).get(\"HTTPStatusCode\")\n\n    if
      status == 200:\n        print(f\"Successful S3 get_object response. Status
      - {status}. {train_file} downloaded.\")\n        df_train =
      pd.read_csv(response.get(\"Body\"))\n    else:\n       
      print(f\"Unsuccessful S3 get_object response. Status - {status}\")\n\n   
      # Download test file\n    response =
      s3_client.get_object(Bucket=bucket_name, Key=test_file)\n    status =
      response.get(\"ResponseMetadata\", {}).get(\"HTTPStatusCode\")\n\n    if
      status == 200:\n        print(f\"Successful S3 get_object response. Status
      - {status}. {test_file} downloaded.\")\n        df_test =
      pd.read_csv(response.get(\"Body\"))\n    else:\n       
      print(f\"Unsuccessful S3 get_object response. Status - {status}\")\n\n   
      df_train.to_csv(output_train, index=True, header=True)\n   
      df_test.to_csv(output_test, index=True, header=True)\n\nimport
      argparse\n_parser = argparse.ArgumentParser(prog='Read csv',
      description='')\n_parser.add_argument(\"--output-train\",
      dest=\"output_train\", type=_make_parent_dirs_and_return_path,
      required=True,
      default=argparse.SUPPRESS)\n_parser.add_argument(\"--output-test\",
      dest=\"output_test\", type=_make_parent_dirs_and_return_path,
      required=True, default=argparse.SUPPRESS)\n_parsed_args =
      vars(_parser.parse_args())\n\n_outputs = read_csv(**_parsed_args)\n"],
      "image": "python:3.7"}}, "name": "Read csv", "outputs": [{"name":
      "output_train", "type": "CSV"}, {"name": "output_test", "type": "CSV"}]}
    pipelines.kubeflow.org/execution_cache_key: c2ee9595ab37372588aee3b944752bd51658f843a682c60b9f431729c61cc9a1
    pipelines.kubeflow.org/metadata_input_artifact_ids: '[]'
    pipelines.kubeflow.org/metadata_output_artifact_ids: '[]'
    sidecar.istio.io/inject: 'false'
    workflows.argoproj.io/node-name: titanic-model-mxsrr.read-csv
    workflows.argoproj.io/outputs: >-
      {"artifacts":[{"name":"read-csv-output_test","path":"/tmp/outputs/output_test/data"},{"name":"read-csv-output_train","path":"/tmp/outputs/output_train/data"}]}
    workflows.argoproj.io/template: >-
      {"name":"read-csv","inputs":{},"outputs":{"artifacts":[{"name":"read-csv-output_test","path":"/tmp/outputs/output_test/data"},{"name":"read-csv-output_train","path":"/tmp/outputs/output_train/data"}]},"metadata":{"annotations":{"pipelines.kubeflow.org/component_ref":"{}","pipelines.kubeflow.org/component_spec":"{\"implementation\":
      {\"container\": {\"args\": [\"--output-train\", {\"outputPath\":
      \"output_train\"}, \"--output-test\", {\"outputPath\": \"output_test\"}],
      \"command\": [\"sh\", \"-c\", \"(PIP_DISABLE_PIP_VERSION_CHECK=1 python3
      -m pip install --quiet --no-warn-script-location 'boto3' 'pandas' ||
      PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
      --no-warn-script-location 'boto3' 'pandas' --user) \u0026\u0026 \\\"$0\\\"
      \\\"$@\\\"\", \"sh\", \"-ec\", \"program_path=$(mktemp)\\nprintf
      \\\"%s\\\" \\\"$0\\\" \u003e \\\"$program_path\\\"\\npython3 -u
      \\\"$program_path\\\" \\\"$@\\\"\\n\", \"def
      _make_parent_dirs_and_return_path(file_path: str):\\n    import os\\n   
      os.makedirs(os.path.dirname(file_path), exist_ok=True)\\n    return
      file_path\\n\\ndef read_csv(output_train,\\n             output_test):  
      \\n    import boto3\\n    import pandas as pd\\n\\n    # Set up
      connection\\n    s3_client = boto3.client('s3')\\n    bucket_name =
      \\\"kubeflow-bucket\\\" \\n    train_file =
      \\\"titanic_input_data/train.csv\\\"\\n    test_file =
      \\\"titanic_input_data/test.csv\\\"\\n\\n    # Download train file\\n   
      response = s3_client.get_object(Bucket=bucket_name, Key=train_file)\\n   
      status = response.get(\\\"ResponseMetadata\\\",
      {}).get(\\\"HTTPStatusCode\\\")\\n\\n    if status == 200:\\n       
      print(f\\\"Successful S3 get_object response. Status - {status}.
      {train_file} downloaded.\\\")\\n        df_train =
      pd.read_csv(response.get(\\\"Body\\\"))\\n    else:\\n       
      print(f\\\"Unsuccessful S3 get_object response. Status -
      {status}\\\")\\n\\n    # Download test file\\n    response =
      s3_client.get_object(Bucket=bucket_name, Key=test_file)\\n    status =
      response.get(\\\"ResponseMetadata\\\",
      {}).get(\\\"HTTPStatusCode\\\")\\n\\n    if status == 200:\\n       
      print(f\\\"Successful S3 get_object response. Status - {status}.
      {test_file} downloaded.\\\")\\n        df_test =
      pd.read_csv(response.get(\\\"Body\\\"))\\n    else:\\n       
      print(f\\\"Unsuccessful S3 get_object response. Status -
      {status}\\\")\\n\\n    df_train.to_csv(output_train, index=True,
      header=True)\\n    df_test.to_csv(output_test, index=True,
      header=True)\\n\\nimport argparse\\n_parser =
      argparse.ArgumentParser(prog='Read csv',
      description='')\\n_parser.add_argument(\\\"--output-train\\\",
      dest=\\\"output_train\\\", type=_make_parent_dirs_and_return_path,
      required=True,
      default=argparse.SUPPRESS)\\n_parser.add_argument(\\\"--output-test\\\",
      dest=\\\"output_test\\\", type=_make_parent_dirs_and_return_path,
      required=True, default=argparse.SUPPRESS)\\n_parsed_args =
      vars(_parser.parse_args())\\n\\n_outputs = read_csv(**_parsed_args)\\n\"],
      \"image\": \"python:3.7\"}}, \"name\": \"Read csv\", \"outputs\":
      [{\"name\": \"output_train\", \"type\": \"CSV\"}, {\"name\":
      \"output_test\", \"type\":
      \"CSV\"}]}","sidecar.istio.io/inject":"false"},"labels":{"pipelines.kubeflow.org/cache_enabled":"true","pipelines.kubeflow.org/enable_caching":"true","pipelines.kubeflow.org/kfp_sdk_version":"1.8.11","pipelines.kubeflow.org/pipeline-sdk-type":"kfp"}},"container":{"name":"","image":"python:3.7","command":["sh","-c","(PIP_DISABLE_PIP_VERSION_CHECK=1
      python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas'
      || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
      --no-warn-script-location 'boto3' 'pandas' --user) \u0026\u0026 \"$0\"
      \"$@\"","sh","-ec","program_path=$(mktemp)\nprintf \"%s\" \"$0\" \u003e
      \"$program_path\"\npython3 -u \"$program_path\" \"$@\"\n","def
      _make_parent_dirs_and_return_path(file_path: str):\n    import os\n   
      os.makedirs(os.path.dirname(file_path), exist_ok=True)\n    return
      file_path\n\ndef read_csv(output_train,\n             output_test):  
      \n    import boto3\n    import pandas as pd\n\n    # Set up
      connection\n    s3_client = boto3.client('s3')\n    bucket_name =
      \"kubeflow-bucket\" \n    train_file =
      \"titanic_input_data/train.csv\"\n    test_file =
      \"titanic_input_data/test.csv\"\n\n    # Download train file\n    response
      = s3_client.get_object(Bucket=bucket_name, Key=train_file)\n    status =
      response.get(\"ResponseMetadata\", {}).get(\"HTTPStatusCode\")\n\n    if
      status == 200:\n        print(f\"Successful S3 get_object response. Status
      - {status}. {train_file} downloaded.\")\n        df_train =
      pd.read_csv(response.get(\"Body\"))\n    else:\n       
      print(f\"Unsuccessful S3 get_object response. Status - {status}\")\n\n   
      # Download test file\n    response =
      s3_client.get_object(Bucket=bucket_name, Key=test_file)\n    status =
      response.get(\"ResponseMetadata\", {}).get(\"HTTPStatusCode\")\n\n    if
      status == 200:\n        print(f\"Successful S3 get_object response. Status
      - {status}. {test_file} downloaded.\")\n        df_test =
      pd.read_csv(response.get(\"Body\"))\n    else:\n       
      print(f\"Unsuccessful S3 get_object response. Status - {status}\")\n\n   
      df_train.to_csv(output_train, index=True, header=True)\n   
      df_test.to_csv(output_test, index=True, header=True)\n\nimport
      argparse\n_parser = argparse.ArgumentParser(prog='Read csv',
      description='')\n_parser.add_argument(\"--output-train\",
      dest=\"output_train\", type=_make_parent_dirs_and_return_path,
      required=True,
      default=argparse.SUPPRESS)\n_parser.add_argument(\"--output-test\",
      dest=\"output_test\", type=_make_parent_dirs_and_return_path,
      required=True, default=argparse.SUPPRESS)\n_parsed_args =
      vars(_parser.parse_args())\n\n_outputs =
      read_csv(**_parsed_args)\n"],"args":["--output-train","/tmp/outputs/output_train/data","--output-test","/tmp/outputs/output_test/data"],"resources":{}},"archiveLocation":{"archiveLogs":true,"s3":{"endpoint":"s3.amazonaws.com","bucket":"kubeflow-bucket","insecure":true,"accessKeySecret":{"name":"mlpipeline-minio-artifact","key":"accesskey"},"secretKeySecret":{"name":"mlpipeline-minio-artifact","key":"secretkey"},"key":"artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753"}}}
  ownerReferences:
    - apiVersion: argoproj.io/v1alpha1
      kind: Workflow
      name: titanic-model-mxsrr
      uid: b96b6b12-2651-461c-ac47-03dcba0f77dc
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: workflow-controller
      operation: Update
      apiVersion: v1
      time: '2022-05-17T12:44:29Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:pipelines.kubeflow.org/component_ref': {}
            'f:pipelines.kubeflow.org/component_spec': {}
            'f:sidecar.istio.io/inject': {}
            'f:workflows.argoproj.io/node-name': {}
            'f:workflows.argoproj.io/template': {}
          'f:labels':
            .: {}
            'f:pipeline/runid': {}
            'f:pipelines.kubeflow.org/cache_enabled': {}
            'f:pipelines.kubeflow.org/enable_caching': {}
            'f:pipelines.kubeflow.org/kfp_sdk_version': {}
            'f:pipelines.kubeflow.org/pipeline-sdk-type': {}
            'f:workflows.argoproj.io/completed': {}
            'f:workflows.argoproj.io/workflow': {}
          'f:ownerReferences':
            .: {}
            'k:{"uid":"b96b6b12-2651-461c-ac47-03dcba0f77dc"}':
              .: {}
              'f:apiVersion': {}
              'f:blockOwnerDeletion': {}
              'f:controller': {}
              'f:kind': {}
              'f:name': {}
              'f:uid': {}
        'f:spec':
          'f:containers':
            'k:{"name":"main"}':
              .: {}
              'f:args': {}
              'f:command': {}
              'f:env':
                .: {}
                'k:{"name":"ARGO_CONTAINER_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_INCLUDE_SCRIPT_OUTPUT"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
              'f:image': {}
              'f:imagePullPolicy': {}
              'f:name': {}
              'f:resources': {}
              'f:terminationMessagePath': {}
              'f:terminationMessagePolicy': {}
            'k:{"name":"wait"}':
              .: {}
              'f:command': {}
              'f:env':
                .: {}
                'k:{"name":"ARGO_CONTAINER_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_CONTAINER_RUNTIME_EXECUTOR"}':
                  .: {}
                  'f:name': {}
                'k:{"name":"ARGO_INCLUDE_SCRIPT_OUTPUT"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_POD_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:valueFrom':
                    .: {}
                    'f:fieldRef':
                      .: {}
                      'f:apiVersion': {}
                      'f:fieldPath': {}
                'k:{"name":"GODEBUG"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
              'f:image': {}
              'f:imagePullPolicy': {}
              'f:name': {}
              'f:resources': {}
              'f:terminationMessagePath': {}
              'f:terminationMessagePolicy': {}
              'f:volumeMounts':
                .: {}
                'k:{"mountPath":"/argo/podmetadata"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
                'k:{"mountPath":"/argo/secret/mlpipeline-minio-artifact"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
                  'f:readOnly': {}
                'k:{"mountPath":"/var/run/docker.sock"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
                  'f:readOnly': {}
          'f:dnsPolicy': {}
          'f:enableServiceLinks': {}
          'f:restartPolicy': {}
          'f:schedulerName': {}
          'f:securityContext': {}
          'f:serviceAccount': {}
          'f:serviceAccountName': {}
          'f:terminationGracePeriodSeconds': {}
          'f:volumes':
            .: {}
            'k:{"name":"docker-sock"}':
              .: {}
              'f:hostPath':
                .: {}
                'f:path': {}
                'f:type': {}
              'f:name': {}
            'k:{"name":"mlpipeline-minio-artifact"}':
              .: {}
              'f:name': {}
              'f:secret':
                .: {}
                'f:defaultMode': {}
                'f:items': {}
                'f:secretName': {}
            'k:{"name":"podmetadata"}':
              .: {}
              'f:downwardAPI':
                .: {}
                'f:defaultMode': {}
                'f:items': {}
              'f:name': {}
    - manager: Swagger-Codegen
      operation: Update
      apiVersion: v1
      time: '2022-05-17T12:44:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:pipelines.kubeflow.org/metadata_input_artifact_ids': {}
            'f:pipelines.kubeflow.org/metadata_output_artifact_ids': {}
          'f:labels':
            'f:pipelines.kubeflow.org/metadata_context_id': {}
            'f:pipelines.kubeflow.org/metadata_execution_id': {}
            'f:pipelines.kubeflow.org/metadata_written': {}
    - manager: argoexec
      operation: Update
      apiVersion: v1
      time: '2022-05-17T12:44:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:workflows.argoproj.io/outputs': {}
    - manager: kubelet
      operation: Update
      apiVersion: v1
      time: '2022-05-17T12:44:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:conditions':
            'k:{"type":"ContainersReady"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Initialized"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Ready"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
          'f:containerStatuses': {}
          'f:hostIP': {}
          'f:phase': {}
          'f:podIP': {}
          'f:podIPs':
            .: {}
            'k:{"ip":"192.168.134.78"}':
              .: {}
              'f:ip': {}
          'f:startTime': {}
status:
  phase: Failed
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-05-17T12:44:30Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2022-05-17T12:44:54Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [wait main]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2022-05-17T12:44:54Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [wait main]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-05-17T12:44:30Z'
  hostIP: 192.168.152.91
  podIP: 192.168.134.78
  podIPs:
    - ip: 192.168.134.78
  startTime: '2022-05-17T12:44:30Z'
  containerStatuses:
    - name: main
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2022-05-17T12:44:40Z'
          finishedAt: '2022-05-17T12:44:53Z'
          containerID: >-
            docker://098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c
      lastState: {}
      ready: false
      restartCount: 0
      image: 'python:3.7'
      imageID: >-
        docker-pullable://python@sha256:0c89239a53e76e6e53364279d122fc714fb70e463dc620cd3752193fef621a61
      containerID: >-
        docker://098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c
      started: false
    - name: wait
      state:
        terminated:
          exitCode: 1
          reason: Error
          message: 'failed to put file: Access Denied'
          startedAt: '2022-05-17T12:44:40Z'
          finishedAt: '2022-05-17T12:44:54Z'
          containerID: >-
            docker://f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c
      lastState: {}
      ready: false
      restartCount: 0
      image: 'gcr.io/ml-pipeline/argoexec:v3.1.6-patch-license-compliance'
      imageID: >-
        docker-pullable://gcr.io/ml-pipeline/argoexec@sha256:44cf8455a51aa5b961d1a86f65e39adf5ffca9bdcd33a745c3b79f430b7439e0
      containerID: >-
        docker://f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c
      started: false
  qosClass: BestEffort
spec:
  volumes:
    - name: podmetadata
      downwardAPI:
        items:
          - path: annotations
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.annotations
        defaultMode: 420
    - name: docker-sock
      hostPath:
        path: /var/run/docker.sock
        type: Socket
    - name: mlpipeline-minio-artifact
      secret:
        secretName: mlpipeline-minio-artifact
        items:
          - key: accesskey
            path: accesskey
          - key: secretkey
            path: secretkey
        defaultMode: 420
    - name: default-editor-token-fhjc7
      secret:
        secretName: default-editor-token-fhjc7
        defaultMode: 420
  containers:
    - name: wait
      image: 'gcr.io/ml-pipeline/argoexec:v3.1.6-patch-license-compliance'
      command:
        - argoexec
        - wait
        - '--loglevel'
        - info
      env:
        - name: ARGO_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: ARGO_CONTAINER_RUNTIME_EXECUTOR
        - name: GODEBUG
          value: x509ignoreCN=0
        - name: ARGO_CONTAINER_NAME
          value: wait
        - name: ARGO_INCLUDE_SCRIPT_OUTPUT
          value: 'false'
      resources: {}
      volumeMounts:
        - name: podmetadata
          mountPath: /argo/podmetadata
        - name: docker-sock
          readOnly: true
          mountPath: /var/run/docker.sock
        - name: mlpipeline-minio-artifact
          readOnly: true
          mountPath: /argo/secret/mlpipeline-minio-artifact
        - name: default-editor-token-fhjc7
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
    - name: main
      image: 'python:3.7'
      command:
        - sh
        - '-c'
        - >-
          (PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
          --no-warn-script-location 'boto3' 'pandas' ||
          PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
          --no-warn-script-location 'boto3' 'pandas' --user) && "$0" "$@"
        - sh
        - '-ec'
        - |
          program_path=$(mktemp)
          printf "%s" "$0" > "$program_path"
          python3 -u "$program_path" "$@"
        - >
          def _make_parent_dirs_and_return_path(file_path: str):
              import os
              os.makedirs(os.path.dirname(file_path), exist_ok=True)
              return file_path

          def read_csv(output_train,
                       output_test):   
              import boto3
              import pandas as pd

              # Set up connection
              s3_client = boto3.client('s3')
              bucket_name = "kubeflow-bucket" 
              train_file = "titanic_input_data/train.csv"
              test_file = "titanic_input_data/test.csv"

              # Download train file
              response = s3_client.get_object(Bucket=bucket_name, Key=train_file)
              status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")

              if status == 200:
                  print(f"Successful S3 get_object response. Status - {status}. {train_file} downloaded.")
                  df_train = pd.read_csv(response.get("Body"))
              else:
                  print(f"Unsuccessful S3 get_object response. Status - {status}")

              # Download test file
              response = s3_client.get_object(Bucket=bucket_name, Key=test_file)
              status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")

              if status == 200:
                  print(f"Successful S3 get_object response. Status - {status}. {test_file} downloaded.")
                  df_test = pd.read_csv(response.get("Body"))
              else:
                  print(f"Unsuccessful S3 get_object response. Status - {status}")

              df_train.to_csv(output_train, index=True, header=True)
              df_test.to_csv(output_test, index=True, header=True)

          import argparse

          _parser = argparse.ArgumentParser(prog='Read csv', description='')

          _parser.add_argument("--output-train", dest="output_train",
          type=_make_parent_dirs_and_return_path, required=True,
          default=argparse.SUPPRESS)

          _parser.add_argument("--output-test", dest="output_test",
          type=_make_parent_dirs_and_return_path, required=True,
          default=argparse.SUPPRESS)

          _parsed_args = vars(_parser.parse_args())

          _outputs = read_csv(**_parsed_args)
      args:
        - '--output-train'
        - /tmp/outputs/output_train/data
        - '--output-test'
        - /tmp/outputs/output_test/data
      env:
        - name: ARGO_CONTAINER_NAME
          value: main
        - name: ARGO_INCLUDE_SCRIPT_OUTPUT
          value: 'false'
      resources: {}
      volumeMounts:
        - name: default-editor-token-fhjc7
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Never
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default-editor
  serviceAccount: default-editor
  nodeName: ip-192-168-152-91.us-west-2.compute.internal
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority

Additionally, I paste the logs for the container of the pod that returns this 'Access Denied' message:

time="2022-05-17T12:44:40.050Z" level=info msg="Starting Workflow Executor" executorType= version=v3.1.6-patch
time="2022-05-17T12:44:40.053Z" level=info msg="Creating a docker executor"
time="2022-05-17T12:44:40.053Z" level=info msg="Executor initialized" includeScriptOutput=false namespace=kubeflow-user-example-com podName=titanic-model-mxsrr-2738330753 template="{\"name\":\"read-csv\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"read-csv-output_test\",\"path\":\"/tmp/outputs/output_test/data\"},{\"name\":\"read-csv-output_train\",\"path\":\"/tmp/outputs/output_train/data\"}]},\"metadata\":{\"annotations\":{\"pipelines.kubeflow.org/component_ref\":\"{}\",\"pipelines.kubeflow.org/component_spec\":\"{\\\"implementation\\\": {\\\"container\\\": {\\\"args\\\": [\\\"--output-train\\\", {\\\"outputPath\\\": \\\"output_train\\\"}, \\\"--output-test\\\", {\\\"outputPath\\\": \\\"output_test\\\"}], \\\"command\\\": [\\\"sh\\\", \\\"-c\\\", \\\"(PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas' || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas' --user) \\u0026\\u0026 \\\\\\\"$0\\\\\\\" \\\\\\\"$@\\\\\\\"\\\", \\\"sh\\\", \\\"-ec\\\", \\\"program_path=$(mktemp)\\\\nprintf \\\\\\\"%s\\\\\\\" \\\\\\\"$0\\\\\\\" \\u003e \\\\\\\"$program_path\\\\\\\"\\\\npython3 -u \\\\\\\"$program_path\\\\\\\" \\\\\\\"$@\\\\\\\"\\\\n\\\", \\\"def _make_parent_dirs_and_return_path(file_path: str):\\\\n    import os\\\\n    os.makedirs(os.path.dirname(file_path), exist_ok=True)\\\\n    return file_path\\\\n\\\\ndef read_csv(output_train,\\\\n             output_test):   \\\\n    import boto3\\\\n    import pandas as pd\\\\n\\\\n    # Set up connection\\\\n    s3_client = boto3.client('s3')\\\\n    bucket_name = \\\\\\\"kubeflow-bucket\\\\\\\" \\\\n    train_file = \\\\\\\"titanic_input_data/train.csv\\\\\\\"\\\\n    test_file = \\\\\\\"titanic_input_data/test.csv\\\\\\\"\\\\n\\\\n    # Download train file\\\\n    response = s3_client.get_object(Bucket=bucket_name, Key=train_file)\\\\n    status = response.get(\\\\\\\"ResponseMetadata\\\\\\\", {}).get(\\\\\\\"HTTPStatusCode\\\\\\\")\\\\n\\\\n    if status == 200:\\\\n        print(f\\\\\\\"Successful S3 get_object response. Status - {status}. {train_file} downloaded.\\\\\\\")\\\\n        df_train = pd.read_csv(response.get(\\\\\\\"Body\\\\\\\"))\\\\n    else:\\\\n        print(f\\\\\\\"Unsuccessful S3 get_object response. Status - {status}\\\\\\\")\\\\n\\\\n    # Download test file\\\\n    response = s3_client.get_object(Bucket=bucket_name, Key=test_file)\\\\n    status = response.get(\\\\\\\"ResponseMetadata\\\\\\\", {}).get(\\\\\\\"HTTPStatusCode\\\\\\\")\\\\n\\\\n    if status == 200:\\\\n        print(f\\\\\\\"Successful S3 get_object response. Status - {status}. {test_file} downloaded.\\\\\\\")\\\\n        df_test = pd.read_csv(response.get(\\\\\\\"Body\\\\\\\"))\\\\n    else:\\\\n        print(f\\\\\\\"Unsuccessful S3 get_object response. Status - {status}\\\\\\\")\\\\n\\\\n    df_train.to_csv(output_train, index=True, header=True)\\\\n    df_test.to_csv(output_test, index=True, header=True)\\\\n\\\\nimport argparse\\\\n_parser = argparse.ArgumentParser(prog='Read csv', description='')\\\\n_parser.add_argument(\\\\\\\"--output-train\\\\\\\", dest=\\\\\\\"output_train\\\\\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\\\n_parser.add_argument(\\\\\\\"--output-test\\\\\\\", dest=\\\\\\\"output_test\\\\\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\\\n_parsed_args = vars(_parser.parse_args())\\\\n\\\\n_outputs = read_csv(**_parsed_args)\\\\n\\\"], \\\"image\\\": \\\"python:3.7\\\"}}, \\\"name\\\": \\\"Read csv\\\", \\\"outputs\\\": [{\\\"name\\\": \\\"output_train\\\", \\\"type\\\": \\\"CSV\\\"}, {\\\"name\\\": \\\"output_test\\\", \\\"type\\\": \\\"CSV\\\"}]}\",\"sidecar.istio.io/inject\":\"false\"},\"labels\":{\"pipelines.kubeflow.org/cache_enabled\":\"true\",\"pipelines.kubeflow.org/enable_caching\":\"true\",\"pipelines.kubeflow.org/kfp_sdk_version\":\"1.8.11\",\"pipelines.kubeflow.org/pipeline-sdk-type\":\"kfp\"}},\"container\":{\"name\":\"\",\"image\":\"python:3.7\",\"command\":[\"sh\",\"-c\",\"(PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas' || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'boto3' 'pandas' --user) \\u0026\\u0026 \\\"$0\\\" \\\"$@\\\"\",\"sh\",\"-ec\",\"program_path=$(mktemp)\\nprintf \\\"%s\\\" \\\"$0\\\" \\u003e \\\"$program_path\\\"\\npython3 -u \\\"$program_path\\\" \\\"$@\\\"\\n\",\"def _make_parent_dirs_and_return_path(file_path: str):\\n    import os\\n    os.makedirs(os.path.dirname(file_path), exist_ok=True)\\n    return file_path\\n\\ndef read_csv(output_train,\\n             output_test):   \\n    import boto3\\n    import pandas as pd\\n\\n    # Set up connection\\n    s3_client = boto3.client('s3')\\n    bucket_name = \\\"kubeflow-bucket\\\" \\n    train_file = \\\"titanic_input_data/train.csv\\\"\\n    test_file = \\\"titanic_input_data/test.csv\\\"\\n\\n    # Download train file\\n    response = s3_client.get_object(Bucket=bucket_name, Key=train_file)\\n    status = response.get(\\\"ResponseMetadata\\\", {}).get(\\\"HTTPStatusCode\\\")\\n\\n    if status == 200:\\n        print(f\\\"Successful S3 get_object response. Status - {status}. {train_file} downloaded.\\\")\\n        df_train = pd.read_csv(response.get(\\\"Body\\\"))\\n    else:\\n        print(f\\\"Unsuccessful S3 get_object response. Status - {status}\\\")\\n\\n    # Download test file\\n    response = s3_client.get_object(Bucket=bucket_name, Key=test_file)\\n    status = response.get(\\\"ResponseMetadata\\\", {}).get(\\\"HTTPStatusCode\\\")\\n\\n    if status == 200:\\n        print(f\\\"Successful S3 get_object response. Status - {status}. {test_file} downloaded.\\\")\\n        df_test = pd.read_csv(response.get(\\\"Body\\\"))\\n    else:\\n        print(f\\\"Unsuccessful S3 get_object response. Status - {status}\\\")\\n\\n    df_train.to_csv(output_train, index=True, header=True)\\n    df_test.to_csv(output_test, index=True, header=True)\\n\\nimport argparse\\n_parser = argparse.ArgumentParser(prog='Read csv', description='')\\n_parser.add_argument(\\\"--output-train\\\", dest=\\\"output_train\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\n_parser.add_argument(\\\"--output-test\\\", dest=\\\"output_test\\\", type=_make_parent_dirs_and_return_path, required=True, default=argparse.SUPPRESS)\\n_parsed_args = vars(_parser.parse_args())\\n\\n_outputs = read_csv(**_parsed_args)\\n\"],\"args\":[\"--output-train\",\"/tmp/outputs/output_train/data\",\"--output-test\",\"/tmp/outputs/output_test/data\"],\"resources\":{}},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"s3.amazonaws.com\",\"bucket\":\"kubeflow-bucket\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753\"}}}" version="&Version{Version:v3.1.6-patch,BuildDate:2021-08-18T12:50:41Z,GitCommit:9c47963b66061143735843db27977dbf9b4cbbf4,GitTag:v3.1.6-patch,GitTreeState:clean,GoVersion:go1.15.7,Compiler:gc,Platform:linux/amd64,}"
time="2022-05-17T12:44:40.053Z" level=info msg="Starting annotations monitor"
time="2022-05-17T12:44:40.054Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:40.054Z" level=info msg="Starting deadline monitor"
time="2022-05-17T12:44:40.097Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Created {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:40.097Z" level=info msg="mapped container name \"main\" to container ID \"098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c\" (created at 2022-05-17 12:44:40 +0000 UTC, status Created)"
time="2022-05-17T12:44:40.097Z" level=info msg="mapped container name \"wait\" to container ID \"f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c\" (created at 2022-05-17 12:44:37 +0000 UTC, status Up)"
time="2022-05-17T12:44:41.054Z" level=info msg="docker wait 098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c"
time="2022-05-17T12:44:41.098Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:41.168Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:42.168Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:42.206Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:43.206Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:43.234Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:44.234Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:44.260Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:45.261Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:45.292Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:46.292Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:46.323Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:47.323Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:47.356Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:48.356Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:48.391Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:49.391Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:49.417Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:50.054Z" level=info msg="/argo/podmetadata/annotations updated"
time="2022-05-17T12:44:50.417Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:50.445Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:51.445Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:51.473Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:52.474Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:52.510Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:53.510Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:53.535Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Up {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:53.718Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:53.743Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Exited {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:53.743Z" level=info msg="Main container completed"
time="2022-05-17T12:44:53.743Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"time="2022-05-17T12:44:53.743Z" level=info msg="Saving logs"
time="2022-05-17T12:44:53.743Z" level=info msg="[docker logs 098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c]"
time="2022-05-17T12:44:53.766Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/main.log"
time="2022-05-17T12:44:53.766Z" level=info msg="Creating minio client s3.amazonaws.com using static credentials"
time="2022-05-17T12:44:53.766Z" level=info msg="Saving from /tmp/argo/outputs/logs/main.log to s3 (endpoint: s3.amazonaws.com, bucket: kubeflow-bucket, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/main.log)"
time="2022-05-17T12:44:53.978Z" level=error msg="executor error: failed to put file: Access Denied"
time="2022-05-17T12:44:53.978Z" level=info msg="No output parameters"
time="2022-05-17T12:44:53.978Z" level=info msg="Saving output artifacts"
time="2022-05-17T12:44:53.979Z" level=info msg="Staging artifact: read-csv-output_test"
time="2022-05-17T12:44:53.979Z" level=info msg="Copying /tmp/outputs/output_test/data from container base image layer to /tmp/argo/outputs/artifacts/read-csv-output_test.tgz"
time="2022-05-17T12:44:53.979Z" level=info msg="Archiving main:/tmp/outputs/output_test/data to /tmp/argo/outputs/artifacts/read-csv-output_test.tgz"
time="2022-05-17T12:44:53.979Z" level=info msg="sh -c docker cp -a 098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c:/tmp/outputs/output_test/data - | gzip  > /tmp/argo/outputs/artifacts/read-csv-output_test.tgz"
time="2022-05-17T12:44:54.052Z" level=info msg="Archiving completed"
time="2022-05-17T12:44:54.052Z" level=info msg="S3 Save path: /tmp/argo/outputs/artifacts/read-csv-output_test.tgz, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/read-csv-output_test.tgz"
time="2022-05-17T12:44:54.052Z" level=info msg="Creating minio client s3.amazonaws.com using static credentials"
time="2022-05-17T12:44:54.052Z" level=info msg="Saving from /tmp/argo/outputs/artifacts/read-csv-output_test.tgz to s3 (endpoint: s3.amazonaws.com, bucket: kubeflow-bucket, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/read-csv-output_test.tgz)"
time="2022-05-17T12:44:54.267Z" level=error msg="executor error: failed to put file: Access Denied"
time="2022-05-17T12:44:54.267Z" level=info msg="Annotating pod with output"
time="2022-05-17T12:44:54.318Z" level=info msg="Patch pods 200"
time="2022-05-17T12:44:54.325Z" level=info msg="docker ps --all --no-trunc --format={{.Status}}|{{.Label \"io.kubernetes.container.name\"}}|{{.ID}}|{{.CreatedAt}} --filter=label=io.kubernetes.pod.namespace=kubeflow-user-example-com --filter=label=io.kubernetes.pod.name=titanic-model-mxsrr-2738330753"
time="2022-05-17T12:44:54.372Z" level=info msg="listed containers" containers="map[main:{098bcdb8709228ab5e591310b8490a74c5241a77bfcf526964e05759712cc25c Exited {0 63788388280 <nil>}} wait:{f6e9f2f88b71c52af5afe4d6b0434dbaf4b311174f88362fa71398191792225c Up {0 63788388277 <nil>}}]"
time="2022-05-17T12:44:54.372Z" level=info msg="Killing sidecars []"
time="2022-05-17T12:44:54.372Z" level=info msg="Alloc=10483 TotalAlloc=17070 Sys=73297 NumGC=5 Goroutines=14"
goswamig commented 2 years ago

@psulowsk Thansk for sending the details, sorry i missed asking this.

How are u enabling the pod to grant permission for S3 access (Read/write) both ? I can imagine few options, but its not clear from describe output and logs.

  1. Adding env variable in pod for AWS ACCESS_KEY and SECRET_ACCESS_KEY. I did not see such env set in pod.
  2. You can add S3 policy inside the instance role, in this case all pods on node will have same permissions
  3. Kubeflow profile with IAM role, a detailed view on how to configured such permission is mentioned here.

Can you share which options are you using in your setup?

psulowsk commented 2 years ago

Hi @goswamig, currently, I am using the IAM Role which gives EC2 instances full access to all S3 buckets.

And as I mentioned at the beginning - I am able to successfully access S3 buckets when I am not using s3-bucket-ssl-requests-only bucket policy. Access is denied after this policy is applied.

goswamig commented 2 years ago

To make sure that pod on EC2 instance can access your s3 bucket first before going into kubeflow details. can you check following ?

  1. ssh into EC2 instance (K8s worker node in your case I believe) and see if you can access s3 bucket from there? if yes, can u try running below pod.
  2. Can u try running below command and see if you have access

cat <<EoF> ~/job-s3.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: eks-iam-test-s3
spec:
  template:
    metadata:
      labels:
        app: eks-iam-test-s3
    spec:
      containers:
      - name: eks-iam-test
        image: amazon/aws-cli:latest
        args: ["s3", "ls"]
      restartPolicy: Never
EoF

kubectl apply -f ~/job-s3.yaml

Ideally I would recommend to use IRSA feature which is supported in kubeflow. You can find more details here

psulowsk commented 2 years ago

Hi @goswamig , Thank you for your reply. My comments below:

  1. I have successfully connected to my worker node via bastion host (as workers are in a private subnet) and accessed the S3 bucket with the applied bucket policy.

access

  1. I have successfully run this job. The output is my S3 buckets list.
kubectl describe jobs/eks-iam-test-s3
Name:           eks-iam-test-s3
Namespace:      default
Selector:       controller-uid=d7e8e468-f412-4bc0-9ca0-0aefcbafe3e0
Labels:         app=eks-iam-test-s3
                controller-uid=d7e8e468-f412-4bc0-9ca0-0aefcbafe3e0
                job-name=eks-iam-test-s3
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Fri, 20 May 2022 16:43:46 +0200
Completed At:   Fri, 20 May 2022 16:43:57 +0200
Duration:       11s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  app=eks-iam-test-s3
           controller-uid=d7e8e468-f412-4bc0-9ca0-0aefcbafe3e0
           job-name=eks-iam-test-s3
  Containers:
   eks-iam-test:
    Image:      amazon/aws-cli:latest
    Port:       <none>
    Host Port:  <none>
    Args:
      s3
      ls
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  3m26s  job-controller  Created pod: eks-iam-test-s3-nbnlw
  Normal  Completed         3m15s  job-controller  Job completed

and logs:

pods=$(kubectl get pods --selector=job-name=eks-iam-test-s3 --output=jsonpath='{.items[*].metadata.name}')

kubectl logs $pods
2021-09-28 10:58:41 elasticbeanstalk-us-east-2-703405242076
2022-05-17 12:43:51 kubeflow-bucket-sparklander-priv
2022-03-31 18:26:57 test-buc-2022

Honestly I don't understand how IRSA and Profiles can help me to access my S3 bucket with HTTPS only policy. I suppose they could be a solution if I could not access the S3 bucket at all, but in this case it is only about HTTPS access.

I will be very grateful for any further suggestions. Thank you

rrrkharse commented 2 years ago

@psulowsk @goswamig I think the issue here is with how minio client accesses S3 buckets, in that it is not using https.

In the below logs the access to bucket is denied when the bucket requires aws:SecureTransport to be accessed. And the minio client is being used to access (and write to) the bucket, since the artifact logs are being uploaded.

time="2022-05-17T12:44:54.052Z" level=info msg="S3 Save path: /tmp/argo/outputs/artifacts/read-csv-output_test.tgz, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/read-csv-output_test.tgz"
time="2022-05-17T12:44:54.052Z" level=info msg="Creating minio client s3.amazonaws.com using static credentials"
time="2022-05-17T12:44:54.052Z" level=info msg="Saving from /tmp/argo/outputs/artifacts/read-csv-output_test.tgz to s3 (endpoint: s3.amazonaws.com, bucket: kubeflow-bucket, key: artifacts/titanic-model-mxsrr/titanic-model-mxsrr-2738330753/read-csv-output_test.tgz)"
time="2022-05-17T12:44:54.267Z" level=error msg="executor error: failed to put file: Access Denied"

I'll check if there is some minio configuration that can be enabled to use https when accessing the bucket.

rrrkharse commented 2 years ago

Looks like we are setting insecure: true here which is why the policy is failing: https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/apps/pipeline/s3/config#L8

Checking why this was set and testing using insecure: true.

rrrkharse commented 2 years ago

Merged the fix into https://github.com/awslabs/kubeflow-manifests/tree/release-v1.4.1-aws-b1.0.0

Let us know if this resolve the issue. You may need to redeploy minio for the configuration to be applied.

goswamig commented 2 years ago

@rrrkharse thanks for finding out. Indeed the doc

https://docs.min.io/docs/minio-client-complete-guide.html

  --insecure                       Disable SSL certificate verification.

Option [ --insecure]
Skip SSL certificate verification.

setting insecure flag true will skip the ssl verification.

psulowsk commented 2 years ago

Looks that it works. Thank you for your help @goswamig and @rrrkharse!