allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.67k stars 654 forks source link

specify a conda environment for each ClearML Pipeline/Task #620

Open Waerden001 opened 2 years ago

Waerden001 commented 2 years ago

Is it possible to specify a conda environment for each ClearML Pipeline/Task?

Basically I use ClearML to build several Pipelines for various Projects. However,

I want to avoid the ClearML overheads of installing dependencies for each and every task, since all the conda environments already exist on the remote machines (with the same name but maybe with different path on different machines).

I know we can modify the python binary path in the clearml.conf file, but since we have to do this for each and every task, it's surely not a good choice.

Is there a way to just specify the conda env name (e.g. dev) when running a Pipeline/Task? This basically servers as a way to activate a conda environment and run a specific Pipeline/Task in that environment, in other words, we want to run the following commands on all remote machines

jkhenning commented 2 years ago

Hi @Waerden001,

There's actually a hack for this, you can configure the agent using:

agent.package_manager.conda_env_as_base_docker: true

And than, on each task, use the CONTAINER/image field to specify the conda path (in the UI) or using the SDK with task.set_base_docker(docker_image="/path/to/conda")

When the agent will execute the task, this setting amounts to conda activate "/path/to/conda"

wxdrizzle commented 9 months ago

Hi @jkhenning, could you please elaborate more about how to do that? I set the following per your suggestions. Could you please comment about if it is correct or something is missing or redundant?

  1. image
  2. In clearml.conf,

    agent.package_manager.conda_env_as_base_docker: true
    agent.package_manager.type: conda
    agent.python_binary: "/xxxxxx/anaconda3/envs/research/bin/python"
  3. Run agent using normal mode by clearml-agent daemon.

Then seems that the task runs successfully with console output indicating information about conda activate research (shown below).

1707098640594 aiserver1:0 INFO task 3cb8e8e58d324bddaf3e24c3c49ed808 pulled from 9f8c788b758741b39157f251acf5ba6f by worker aiserver1:0

1707098646405 aiserver1:0 DEBUG Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.jsoselfp.cfg):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = {something sensitive}
api.web_server = {something sensitive}
api.files_server = {something sensitive}
api.credentials.access_key = DKQ2XH3PQAYEHHOWRGD1
api.host =  {something sensitive}
agent.worker_id = aiserver1:0
agent.worker_name = aiserver1
agent.force_git_ssh_protocol = false
agent.python_binary = /home/xin/software/anaconda3/envs/research/bin/python
agent.package_manager.type = conda
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.package_manager.conda_env_as_base_docker = true
agent.venvs_dir = /home/xin/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/xin/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/xin/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/xin/.clearml/pip-cache
agent.docker_apt_cache = /home/xin/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.git_user = 
agent.default_python = 3.10
agent.cuda_version = 121
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.s3.use_credentials_chain = false
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.aws.boto3.multipart_threshold = 8388608
sdk.aws.boto3.multipart_chunksize = 8388608
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = 
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false

Executing task id [3cb8e8e58d324bddaf3e24c3c49ed808]:
repository = git@github.com:wxdrizzle/wall_seg_kmax.git
branch = main
version_num = 02016306ebed94a4e4803b1f61b65b85229c668a
tag = 
docker_cmd = /home/xin/software/anaconda3/envs/research
entry_point = train.py
working_dir = .

Executing Conda: /home/xin/software/anaconda3/condabin/conda env remove -p /home/xin/.clearml/venvs-builds/3.10 --quiet --json
Using pre-existing Conda environment from /home/xin/software/anaconda3/envs/research

Using cached repository in "/home/xin/.clearml/vcs-cache/wall_seg_kmax.git.98c1c6d308f1158464bf87aabf85cb50/wall_seg_kmax.git"
Note: switching to '02016306ebed94a4e4803b1f61b65b85229c668a'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 0201630 msic
type: git
url: git@github.com:wxdrizzle/wall_seg_kmax.git
branch: HEAD
commit: 02016306ebed94a4e4803b1f61b65b85229c668a
root: /home/xin/.clearml/venvs-builds/3.10/task_repository/wall_seg_kmax.git

Conda environment in read-only mode, skipping pip upgrade.
Executing Conda: /home/xin/software/anaconda3/condabin/conda list --json -p /home/xin/software/anaconda3/envs/research

1707098651825 aiserver1:0 DEBUG Adding venv into cache: /home/xin/.clearml/venvs-builds/3.10
Running task id [3cb8e8e58d324bddaf3e24c3c49ed808]:
[.]$ source /home/xin/software/anaconda3/etc/profile.d/conda.sh && conda activate /home/xin/software/anaconda3/envs/research && /home/xin/software/anaconda3/envs/research/bin/python -u train.py
Summary - installed python packages:
conda:
- blas==1.0
(here it shows all packages in my conda environment, I manually delete them here)

Environment setup completed successfully

Starting Task Execution:
...