allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

Docker creates venv anyway #1074

Open starsky opened 1 year ago

starsky commented 1 year ago

Proposal Summary

When I try to run clearml-agent with --docker I get such command:

['docker', 'run', '-t', '--gpus', 'all', '-l', 'clearml-worker-id=gruffi:0', '-l', 'clearml-parent-worker-id=gruffi:0', '-e', 'CLEARML_WORKER_ID=gruffi:0', '-e', 'CLEARML_DOCKER_IMAGE=mmhmm_ml', '-e', 'CLEARML_TASK_ID=8d993cdcaa3d428eaab6076202c9b84c', '-v', '/tmp/.clearml_agent.3ov8j2kz.cfg:/tmp/clearml.conf', '-e', 'CLEARML_CONFIG_FILE=/tmp/clearml.conf', '-v', '/tmp/clearml_agent.ssh.djwm8kir:/.ssh', '-v', '/home/michal/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/home/michal/.clearml/pip-cache:/root/.cache/pip', '-v', '/home/michal/.clearml/pip-download-cache:/root/.clearml/pip-download-cache', '-v', '/home/michal/.clearml/cache:/clearml_agent_cache', '-v', '/home/michal/.clearml/vcs-cache:/root/.clearml/vcs-cache', '-v', '/home/michal/.clearml/venvs-cache:/root/.clearml/venvs-cache', '--rm', 'mmhmm_ml', 'bash', '-c', 'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; cp -Rf /.ssh -T ~/.ssh ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; [ ! -z $LOCAL_PYTHON ] || for i in {15..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && break ; done ; [ ! -z $LOCAL_PYTHON ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL python3-pip" ; [ -z "$CLEARML_APT_INSTALL" ] || (apt-get update -y ; apt-get install -y $CLEARML_APT_INSTALL) ; [ ! -z $LOCAL_PYTHON ] || export LOCAL_PYTHON=python3 ; $LOCAL_PYTHON -m pip install -U "pip<20.2 ; python_version < \'3.10\'" "pip<22.3 ; python_version >= \'3.10\'" ; $LOCAL_PYTHON -m pip install -U clearml-agent ; cp /tmp/clearml.conf ~/default_clearml.conf ; NVIDIA_VISIBLE_DEVICES=all $LOCAL_PYTHON -u -m clearml_agent execute --disable-monitoring  --id 8...']

Why it has to be so complicated?

  1. Why it is arbitrarily running chown -R root /root/.cache/pip? I am having a lot of issue with running it as non root user.
  2. At the end you run command clearml_agent execute --disable-monitoring --id 8... which will create the venv anyway but inside the docker image. But in many cases one already have a image with all libraries installed. Thus it should be sufficient to run a particular command like python train.py

Is there a way to simplify how clearml-agent works?

jkhenning commented 1 year ago

Hi @starsky , regarding #2, for cases where the image already has all libraries installed, you can use the agent's CLEARML_AGENT_SKIP_PIP_VENV_INSTALL or CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL environment variables. The reason this is done by default is that the agent creates the venv with system packages inheritance, so that only missing packages are installed (existing packages matching the requirements are merely inspected by not reinstalled)