After some digging, I figured that I have to install the agent from the master branch as stated in #51
Once I tried to run the agent, I have encountered the following error:
Running kubectl encountered an error: error: error validating "/tmp/clearml_k8stmpl_8288bk09.yml": error validating data: ValidationError(Pod.spec.containers[0].name): invalid type for io.k8s.api.core.v1.Container.name: got "map", expected "string"; if you choose to ignore these errors, turn validation off with --validate=false
Regardless, following my conversation with @bmartinn which has helped me a lot via Slack, I wanted to state some important things while working with the agent and k8s, especially because it wasn't clear to me how to inject my SSH key and run a git clone on a private git repository.
You have to ensure that the host of your repo won't stop you from clonning e.g:
Host key verification failed.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
In order to make this work you have to make sure git knows about your SSH key and that it doesn't require a strict host key checking. This can be done by overriding the GIT_SSH_COMMAND environment variable:
As you can see, I'm also injecting clearml.conf through Azure KeyVault just as I prefer to manage my configuration in that way. You can do it however you feel like though.
k8s deployment
The only thing left is your deployment.yaml (or helm package). Once you have built the above docker image, you can run it and just pass the relevant arguments, for example:
Same as I stated above, the pod-template.yaml is also a configuration in my perspective and I just inject it from the outside world so that I won't have to rebuild the image everytime from scratch.
All the best,
Shaked
UPDATE:
I have added clearml.conf to the pod template volumes otherwise without it you might end up with #55.
I have integrated clearml agent with our k8s cluster using the k8s glue.
As part of my work, I have create the following pod template:
After some digging, I figured that I have to install the agent from the master branch as stated in #51
Once I tried to run the agent, I have encountered the following error:
My assumption is that somewhere around https://github.com/allegroai/clearml-agent/blob/master/clearml_agent/glue/k8s.py#L444 something is not being merged correctly and instead of overriding the
- name: test
, it creates something like- name: { 0: "test", 1: "clearml-....." } or - name: ["test", "clearml-...."]
Regardless, following my conversation with @bmartinn which has helped me a lot via Slack, I wanted to state some important things while working with the agent and k8s, especially because it wasn't clear to me how to inject my SSH key and run a git clone on a private git repository.
clearml.conf
Make sure to set
force_git_ssh_protocol: true
pod-template.yaml
You have to consider two things:
defaultMode: 256
In order to make this work you have to make sure git knows about your SSH key and that it doesn't require a strict host key checking. This can be done by overriding the
GIT_SSH_COMMAND
environment variable:Dockerfile
You can use the following Dockerfile in order to create a small and simple agent:
As you can see, I'm also injecting
clearml.conf
through Azure KeyVault just as I prefer to manage my configuration in that way. You can do it however you feel like though.k8s deployment
The only thing left is your deployment.yaml (or helm package). Once you have built the above docker image, you can run it and just pass the relevant arguments, for example:
Same as I stated above, the
pod-template.yaml
is also a configuration in my perspective and I just inject it from the outside world so that I won't have to rebuild the image everytime from scratch.All the best, Shaked
UPDATE:
I have added
clearml.conf
to the pod template volumes otherwise without it you might end up with #55.