Open freddessert opened 2 years ago
Hi @freddessert,
Did you set your local agent to monitor the queue used by the pipeline?
Yes this output came from the agent when it found a task to execute in the queue. The command I ran to start the agent is clearml-agent daemon --queue default --foreground
Can you share the entire agent's output?
frederic@fd-ubuntu ~ $ clearml-agent daemon --queue default --foreground
Current configuration (clearml_agent v1.4.1, location: /home/frederic/clearml.conf):
----------------------
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = AKIAWE45YOU7WABDI2PS
sdk.aws.s3.region = us-west-2
sdk.aws.s3.use_credentials_chain = false
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.google.storage.project = ml-team-203622
sdk.google.storage.credentials_json = /home/frederic/.config/gcloud/application_default_credentials.json
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://34.28.186.241:8008
api.web_server = http://34.28.186.241:8080
api.files_server = gs://fred_test_dvm_pipelines/clearml-tests
api.credentials.access_key = XG6EW9GTE3FC2F7M25XD
api.host = http://34.28.186.241:8008
agent.worker_id =
agent.worker_name = fd-ubuntu
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/frederic/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/frederic/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/frederic/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/frederic/.clearml/pip-cache
agent.docker_apt_cache = /home/frederic/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.default_python = 3.8
agent.cuda_version = 0
agent.cudnn_version = 0
Worker "fd-ubuntu:0" - Listening to queues:
+----------------------------------+---------+-------+
| id | name | tags |
+----------------------------------+---------+-------+
| 110da5f3a01f4c25bd528f224df5773e | default | |
+----------------------------------+---------+-------+
No tasks in queue 110da5f3a01f4c25bd528f224df5773e
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 110da5f3a01f4c25bd528f224df5773e
No tasks in Queues, sleeping for 5.0 seconds
No tasks in queue 110da5f3a01f4c25bd528f224df5773e
No tasks in Queues, sleeping for 5.0 seconds
task 55fd33f561cc4b08877362b87a995304 pulled from 110da5f3a01f4c25bd528f224df5773e by worker fd-ubuntu:0
Running task '55fd33f561cc4b08877362b87a995304'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.u7bmi16f.txt', '/tmp/.clearml_agent_out.u7bmi16f.txt'
Current configuration (clearml_agent v1.4.1, location: /tmp/.clearml_agent.dgjtsgjf.cfg):
----------------------
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = AKIAWE45YOU7WABDI2PS
sdk.aws.s3.region = us-west-2
sdk.aws.s3.use_credentials_chain = false
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.google.storage.project = ml-team-203622
sdk.google.storage.credentials_json = /home/frederic/.config/gcloud/application_default_credentials.json
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://34.28.186.241:8008
api.web_server = http://34.28.186.241:8080
api.files_server = gs://fred_test_dvm_pipelines/clearml-tests
api.credentials.access_key = XG6EW9GTE3FC2F7M25XD
api.host = http://34.28.186.241:8008
agent.worker_id = fd-ubuntu:0
agent.worker_name = fd-ubuntu
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/frederic/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/frederic/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/frederic/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/frederic/.clearml/pip-cache
agent.docker_apt_cache = /home/frederic/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.default_python = 3.8
agent.cuda_version = 0
agent.cudnn_version = 0
Executing task id [55fd33f561cc4b08877362b87a995304]:
repository =
branch =
version_num =
tag =
docker_cmd =
entry_point = test_pipeline.py
working_dir = .
::: Python virtual environment cache is disabled. To accelerate spin-up time set `agent.venvs_cache.path=~/.clearml/venvs-cache` :::
created virtual environment CPython3.8.10.final.0-64 in 135ms
creator CPython3Posix(dest=/home/frederic/.clearml/venvs-builds/3.8, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/frederic/.local/share/virtualenv)
added seed packages: pip==22.2.2, setuptools==65.4.1, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Looking in indexes: https://pypi.org/simple, https://****@packagecloud.io/Kindred/internal/pypi/simple/
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.2.2
Uninstalling pip-22.2.2:
Successfully uninstalled pip-22.2.2
Successfully installed pip-20.1.1
Looking in indexes: https://pypi.org/simple, https://****@packagecloud.io/Kindred/internal/pypi/simple/
Collecting Cython
Using cached Cython-0.29.32-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.32
Looking in indexes: https://pypi.org/simple, https://****@packagecloud.io/Kindred/internal/pypi/simple/
Collecting numpy==1.22.4
Using cached numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
Installing collected packages: numpy
Successfully installed numpy-1.22.4
Looking in indexes: https://pypi.org/simple, https://****@packagecloud.io/Kindred/internal/pypi/simple/
Collecting boto3==1.23.0
Using cached boto3-1.23.0-py3-none-any.whl (132 kB)
Collecting google_cloud_storage==2.5.0
Using cached google_cloud_storage-2.5.0-py2.py3-none-any.whl (106 kB)
Requirement already satisfied: numpy==1.22.4 in /home/frederic/.clearml/venvs-builds/3.8/lib/python3.8/site-packages (from -r /tmp/cached-reqs066kzq1i.txt (line 3)) (1.22.4)
Collecting jmespath<2.0.0,>=0.7.1
Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.6.0,>=0.5.0
Using cached s3transfer-0.5.2-py3-none-any.whl (79 kB)
Collecting botocore<1.27.0,>=1.26.0
Using cached botocore-1.26.10-py3-none-any.whl (8.8 MB)
Collecting requests<3.0.0dev,>=2.18.0
Using cached requests-2.28.1-py3-none-any.whl (62 kB)
Collecting google-cloud-core<3.0dev,>=2.3.0
Using cached google_cloud_core-2.3.2-py2.py3-none-any.whl (29 kB)
Collecting google-auth<3.0dev,>=1.25.0
Using cached google_auth-2.14.1-py2.py3-none-any.whl (175 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5
Using cached google_api_core-2.10.2-py3-none-any.whl (115 kB)
Collecting google-resumable-media>=2.3.2
Using cached google_resumable_media-2.4.0-py2.py3-none-any.whl (77 kB)
Collecting urllib3<1.27,>=1.25.4
Using cached urllib3-1.26.12-py2.py3-none-any.whl (140 kB)
Collecting python-dateutil<3.0.0,>=2.1
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting charset-normalizer<3,>=2
Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2022.9.24-py3-none-any.whl (161 kB)
Collecting idna<4,>=2.5
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting rsa<5,>=3.1.4; python_version >= "3.6"
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Collecting cachetools<6.0,>=2.0.0
Using cached cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting six>=1.9.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting pyasn1-modules>=0.2.1
Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5
Using cached protobuf-4.21.9-cp37-abi3-manylinux2014_x86_64.whl (408 kB)
Collecting googleapis-common-protos<2.0dev,>=1.56.2
Using cached googleapis_common_protos-1.56.4-py2.py3-none-any.whl (211 kB)
Collecting google-crc32c<2.0dev,>=1.0
Using cached google_crc32c-1.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32 kB)
Collecting pyasn1>=0.1.3
Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Installing collected packages: jmespath, urllib3, six, python-dateutil, botocore, s3transfer, boto3, charset-normalizer, certifi, idna, requests, protobuf, googleapis-common-protos, pyasn1, rsa, cachetools, pyasn1-modules, google-auth, google-api-core, google-cloud-core, google-crc32c, google-resumable-media, google-cloud-storage
Successfully installed boto3-1.23.0 botocore-1.26.10 cachetools-5.2.0 certifi-2022.9.24 charset-normalizer-2.1.1 google-api-core-2.10.2 google-auth-2.14.1 google-cloud-core-2.3.2 google-cloud-storage-2.5.0 google-crc32c-1.5.0 google-resumable-media-2.4.0 googleapis-common-protos-1.56.4 idna-3.4 jmespath-1.0.1 protobuf-4.21.9 pyasn1-0.4.8 pyasn1-modules-0.2.8 python-dateutil-2.8.2 requests-2.28.1 rsa-4.9 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.12
Adding venv into cache: /home/frederic/.clearml/venvs-builds/3.8
Running task id [55fd33f561cc4b08877362b87a995304]:
[.]$ /home/frederic/.clearml/venvs-builds/3.8/bin/python -u /home/frederic/.clearml/venvs-builds/3.8/code/test_pipeline.py
Summary - installed python packages:
pip:
- boto3==1.23.0
- botocore==1.26.10
- cachetools==5.2.0
- certifi==2022.9.24
- charset-normalizer==2.1.1
- Cython==0.29.32
- google-api-core==2.10.2
- google-auth==2.14.1
- google-cloud-core==2.3.2
- google-cloud-storage==2.5.0
- google-crc32c==1.5.0
- google-resumable-media==2.4.0
- googleapis-common-protos==1.56.4
- idna==3.4
- jmespath==1.0.1
- numpy==1.22.4
- protobuf==4.21.9
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- python-dateutil==2.8.2
- requests==2.28.1
- rsa==4.9
- s3transfer==0.5.2
- six==1.16.0
- urllib3==1.26.12
Environment setup completed successfully
Starting Task Execution:
Traceback (most recent call last):
File "/home/frederic/.clearml/venvs-builds/3.8/code/test_pipeline.py", line 1, in <module>
from clearml import PipelineDecorator
ModuleNotFoundError: No module named 'clearml'
Leaving process id 399699
DONE: Running task '55fd33f561cc4b08877362b87a995304', exit status 1
Process failed, exit code 1No tasks in queue 110da5f3a01f4c25bd528f224df5773e
No tasks in Queues, sleeping for 5.0 seconds
OK, and how did you create this task the agent is running?
So I can the task locally first, then I went on the ClearML-Web UI and went to my pipeline and clicked "New Run" and started the Run from there.
However it is possible that because the first time I ran the task I ran it "Locally" it fails to run properly on the agent?
It should have set up the clearml
dependency - I'm trying to figure out if that was somehow removed accidentally when you started the new run
Hi @freddessert,
I just tried that myself again, and I can't seem to reproduce the issue - can you please try using the latest clearml package (1.7.3rc1)?
@jkhenning Upgrading to 1.7.3rc1 fixed my issue! It is possible that it's a bug in 1.7.2 (which was the version I was running on)
Indeed it is - we're releasing a new version soon 🙂
Hello, I was just playing with the pipeline example that can be found here and when I tried to rerun the pipeline on my locally running ClearML-Agent by launching from the Web UI I ran into:
Thank you!