allegroai / clearml-agent

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution
https://clear.ml/docs/
Apache License 2.0
232 stars 90 forks source link

ClearML console leaks credentials passed in as Env Vars. #67

Closed jax79sg closed 1 year ago

jax79sg commented 3 years ago

Hi, can i get the option that ClearML not print anything other than the prints from my codes? The reason is because ClearML is printing the username and passwords i passed to the container via env vars through the ClearML agent. This happens with both clearml-agent and k8s-glue.

My running code

#For simplicity i hardcoded the username/passwords here. In actual fact, its passed in during runtime only.
from clearml import Task, Logger
task = Task.init(project_name='DETECTRON2',task_name='Default Model Architecture',task_type='training')
task.set_base_docker("quay.io/jax79sg/detectron2:v4 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser" )
task.execute_remotely(queue_name="1gpu", exit_process=True)

ClearML console logging before reaching task. Last two lines here exposed the username and passwords.

2021-05-23 22:02:15
ClearML Task: created new task id=52a37845995f417eae3bba88e1f08284
ClearML results page: http://mlops.sytes.net:8080/projects/535c0313dbbe4921be762103fa004067/experiments/52a37845995f417eae3bba88e1f08284/output/log
2021-05-23 22:02:16
task 52a37845995f417eae3bba88e1f08284 pulled from f03539fca75f461ab3e6297186bdb045 by worker master-node:gpu0
Running Task 52a37845995f417eae3bba88e1f08284 inside docker: quay.io/jax79sg/detectron2:v4 arguments: ['--env', 'GIT_SSL_NO_VERIFY=true', '--env', 'TRAINS_AGENT_GIT_USER=testuser', '--env', 'TRAINS_AGENT_GIT_PASS=testuser']
2021-05-23 22:02:17
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', '--env', 'GIT_SSL_NO_VERIFY=true', '--env', 'TRAINS_AGENT_GIT_USER=testuser', '--env', 'TRAINS_AGENT_GIT_PASS=testuser', '-e', 'CLEARML_WORKER_ID=master-node:gpu0', '-e', 'CLEARML_DOCKER_IMAGE=quay.io/jax79sg/detectron2:v4 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=testuser --env TRAINS_AGENT_GIT_PASS=testuser', '-v'
jkhenning commented 3 years ago

Hi @jax79sg,

I suggest adding a configuration option named agent.hide_docker_command_env_vars (turned on by default) that will allow hiding docker environment variables containing secrets when printing out the docker command. We'll do that by replacing their values with ********. By default, we will hide the following environment variables values:

Also, in order to support hiding additional environment variables by user preference, we'll add an extra_keys list option, so to make sure the value of your custom environment variable MY_SPECIAL_PASSWORD will not show in the logs when included in the docker command, you'll be able to set extra_keys: ["MY_SPECIAL_PASSWORD"].

Does that make sense?

jax79sg commented 3 years ago

Hi @jkhenning , that sounds useful.

jax79sg commented 3 years ago

Also wondering if some form of secrets management would be supported in future milestones.

jkhenning commented 3 years ago

We're actually just talking on a generic server-side configuration store per user - was that what you were thinking of?

jax79sg commented 3 years ago

We're actually just talking on a generic server-side configuration store per user - was that what you were thinking of?

Yeah that's it.

jkhenning commented 3 years ago

Hi @jax79sg,

Just committed the env vars making option - you can try installing from the repository and see that it works for you 🙂

jax79sg commented 3 years ago

Hi, not sure if i have done this right as the results still do not mask the passwords.

Screenshot 2021-05-27 at 4 12 29 PM Screenshot 2021-05-27 at 4 12 52 PM

This is what i did.

clearml-agent daemon --stop
pip uninstall clearml-agent

python3 setup.py bdist_wheel
pip install dist/clearml_agent-1.0.0-py3-none-any.whl  

Added following into Agent section of clearml.conf

hide_docker_command_env_vars {
        enabled: true
        extra_keys: ['TRAINS_AGENT_GIT_USER','MY_AWS_PASSWORD']
    }

Start the agent again

clearml-agent --config-file clearml.conf daemon --detached --gpus 0 --order-fairness --queue 1gpu  --docker shm-size=16g

I do see clearml-agent picking up the new configuration in startup logs

agent.docker_apt_cache = /home/jax/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvcr.io/nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
agent.default_docker.arguments.0 = --ipc\=host
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.extra_keys.0 = 'TRAINS_AGENT_GIT_USER'
agent.hide_docker_command_env_vars.extra_keys.1 = 'MY_AWS_PASSWORD'
agent.git_user = 
agent.default_python = 3.7
agent.cuda_version = 112
agent.cudnn_version = 0

Worker "master-node:gpu0" - Listening to queues:
+----------------------------------+------+-------+
| id                               | name | tags  |
+----------------------------------+------+-------+
| f03539fca75f461ab3e6297186bdb045 | 1gpu |       |
+----------------------------------+------+-------+

Running in Docker  mode (v19.03 and above) - using default docker image: shm-size=16g ['--ipc=host']

Running CLEARML-AGENT daemon in background mode, writing stdout/stderr to /tmp/.clearml_agent_daemon_outv6okok6t.txt
jkhenning commented 3 years ago

Are you sure you built right right wheel?

Try simply installing directly from the GitHub repo:

pip install -U git+https://github.com/allegroai/clearml-agent.git
jkhenning commented 3 years ago

Hi @jax79sg,

This probably doesn't work since you're using --docker and thus the agent installed inside the docker is not from the latest GitHub repository... I'll let you know once we release an RC so you can test it

jax79sg commented 3 years ago

Thanks @jkhenning , I'll wait till one is available.

jkhenning commented 3 years ago

Hi @jax79sg,

ClearML Agent 1.0.1rc1 was released yesterday - you can try and test with it :)

jax79sg commented 3 years ago

Hi, we tested with Agent 1.0.1rc3, however we are still not seeing masking happening. The results are still the same as what was described here. https://github.com/allegroai/clearml-agent/issues/67#issuecomment-849439218

On top of that, as we are running k8s glue, we find that its impossible to get individuals to adhere to certain a standard env defined in the clearml-agent.conf. A suggestion was to allow regex for each entry. So for example, a catch all env var is *PASSWORD*, and *SECRET* in another.

jkhenning commented 3 years ago

Hi, we tested with Agent 1.0.1rc3, however we are still not seeing masking happening

@jax79sg can you show an example of the agent log?

jax79sg commented 3 years ago

Hi, we tested with Agent 1.0.1rc3, however we are still not seeing masking happening

@jax79sg can you show an example of the agent log?

Hi, the k8s glue merely spawn a pod and continue showing a 5 sec sleep and listen for tasks on queue. Are you referring to this log? Or are you refering to the logs of the spawned pod?

jkhenning commented 3 years ago

Hi @jax79sg, I'm referring to the log where you can see the leaked credentials...

jax79sg commented 3 years ago

Hi, the issue remains with clearml-server==1.1.1.135 - 1.1.1 - 2.1.4

I am using K8S Glue and the clearml.conf has the following in the agent section.

hide_docker_command_env_vars {
       enabled: true
       extra_keys: ['TRAINS_AGENT_GIT_USER','TRAINS_AGENT_GIT_PASS','AWS_ACCESS_KEY','AWS_SECRET_ACCESS','']
    }

This is an extract of my codes run_remote.py

import os
TRAINS_AGENT_GIT_USER="gituser"
TRAINS_AGENT_GIT_PASS="gitpass"
AWS_ACCESS_KEY=os.environ.get("AWS_ACCESS_KEY")
AWS_SECRET_ACCESS=os.environ.get("AWS_SECRET_ACCESS")
print("AWS_ACCESS_KEY: ", AWS_ACCESS_KEY)
print("AWS_SECRET_ACCESS: ", AWS_SECRET_ACCESS)

from clearml import Task, Logger
task = Task.init(project_name='DETECTRON2',task_name='DefaultModelArchitecture',task_type='training')
task.set_base_docker("harbor.ai/public/detectron2:v3 --env GIT_SSL_NO_VERIFY=true --env TRAINS_AGENT_GIT_USER=" +TRAINS_AGENT_GIT_USER+" --env TRAINS_AGENT_GIT_PASS="+TRAINS_AGENT_GIT_PASS + " --env AWS_ACCESS_KEY="+AWS_ACCESS_KEY + " --env AWS_SECRET_ACCESS="+AWS_SECRET_ACCESS)
task._update_requirements({})
task.execute_remotely(queue_name="2xV100-32ram", exit_process=True)

I run my codes in the following manner on my client

AWS_ACCESS_KEY=mykey AWS_ACCESS_ACCESS=myaccess python run_remote.py

What happens is on my ClearML Server, i still see the secrets printed in various portions of the web display. See screenshots. clearm01 clearml02

jkhenning commented 3 years ago

Hi @jax79sg,

This feature was only introduced in the 1.0.1 RC releases (i.e. it is not part of v1.0.0) - you can either use one of the RC releases, or wait for the official v1.0.1 (will be released very soon).

jkhenning commented 1 year ago

Closing this as this was already released. Please reopen if required.