allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
381 stars 132 forks source link

spec: failed to generate spec: failed to mkdir "/opt/clearml/agent": #74

Open hadyan-tvlk opened 3 years ago

hadyan-tvlk commented 3 years ago

Dear ClearML,

I'm trying to deploy the ClearML using Helm kubernetes. However, some containers were failed to deploy: agentservices, apiserver, elasticsearch, fileserver, mongo, and redis.

For instance, for agentservices, the error like following

spec: failed to generate spec: failed to mkdir "/opt/clearml/agent": mkdir /opt/clearml: read-only file system: CreateContainerError 

For clearml-agent and webserver can be deployed successfully

Any idea on this one? Thank you!

jkhenning commented 3 years ago

Hi @hadyan-tvlk,

Are you using the helm chart of the k8s templates? We will upload an update to the helm charts, but the k8s templates (as they are pretty much a duplicate of the templates inside the helm chart) will be deprecated.

jkhenning commented 3 years ago

@hadyan-tvlk,

What Helm chart are you using exactly? We have two options in our GitHub repository

hadyan-tvlk commented 3 years ago

Hi @jkhenning ,

I'm using exactly like in the tutorial here: https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_kubernetes_helm.html#step-2-deploy-clearml-server-in-the-kubernetes-using-helm

So, in this case: https://allegroai.github.io/clearml-server-helm/

Am i using the correct chart?

jkhenning commented 3 years ago

Hi @hadyan-tvlk,

Yes, this is one of our charts. This chart still uses the previous 0.17.0 version (will soon be updated to the new version).

From your description, it seems there's an issue with the pod not being able to write into the ephemeral volume mount, as seen here: read-only file system: CreateContainerError

I suspect this is caused by a readOnlyRootFilesystem: true in your PodSecurityPolicy, somewhere...

hadyan-tvlk commented 3 years ago

Hi @jkhenning,

thanks for confirming.

About the read-only issue, is it possible to customize the location of storage, instead on /opt/clearml, we store it somewhere at non-root directory?

If possible, mind to guide me? Thanks

Because i think changing root file system might be risky and not recommended

jkhenning commented 3 years ago

Well, it should be fairly easy to change the hostPath in the volumes section here

hadyan-tvlk commented 3 years ago

Noted @jkhenning,

But i can't find one (hostPath) for Redis and mongo in the YAML files?

jkhenning commented 3 years ago

@hadyan-tvlk MongoDB, Redis and ElasticSearch don't use host paths, but a persistentVolumeClaim (here, here and here). The claims itself and the PersistentVolumes themselves are defined here. Paths are defined using the hostPath setting.

hadyan-tvlk commented 3 years ago

Noted, thanks @jkhenning