allegroai / clearml-helm-charts

Helm chart repository for the new unified way to deploy ClearML on Kubernetes. ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
35 stars 49 forks source link

ClearML Chart. Service async_delete not deleting files #273

Closed uzmargomez closed 5 months ago

uzmargomez commented 5 months ago

Describe the bug a clear and concise description of what the bug is.

Hi, I updated my ClearML Chart to use the async_delete deployment to delete files from my S3 bucket. I'm running a self hosted k8s cluster and my S3 bucket is also self hosted. I created the following config map to store my credentials

apiVersion: v1
data:
  services.conf: |
    storage_credentials {
      aws {
        s3 {
            use_credentials_chain: false
            credentials: [
              {
                host: "myhost:443"
                bucket: "mybucket"
                key: "key7087806"
                secret: "secret808086"
                region: on-prem
              },
            ]
        }
      }
    }
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/instance: clearml
  name: additional-configs
  namespace: clearml

And added the values

apiserver:
  existingAdditionalConfigsConfigMap: "additional-configs"

However, when trying to delete files using the UI, I get this log on the asyncdelete pod

[2024-03-26 11:22:12,458] [7] [INFO] [clearml.JOB-async_urls_delete.py] Deleting s3 objects for company: adjfapdiuzvc, user: adfaerq
[2024-03-26 11:22:12,475] [7] [WARNING] [clearml.JOB-async_urls_delete.py] Failed to delete 6 files from AWS due to: Missing key or secret for AWS S3 host: myhost:443, bucket: mybucket

I believe that we are missing a volumeMounts section

volumeMounts:  
  - name: apiserver-config
    mountPath: /opt/clearml/config

in the clearml-apiserver container specs. Once I added this, my deployment was able to delete the files without issues. I would open a pull request, but I was told this should not be needed for the async_delete deployment. Maybe I'm wrong, but if it's not needed then I don't understand why there's an apiserver-config in the volumes section of this deployment.

What's your helm version?

version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}

What's your kubectl version?

v1.29.3

What's the chart version?

7.8.0

Enter the changed values of values.yaml?

No response

valeriano-manassero commented 5 months ago

I was able to reproduce this issue so I'm going to release a fix soon, ty for pointing me on right direction about this.

uzmargomez commented 5 months ago

thanks for the quick fix!