allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.66k stars 652 forks source link

Worker Usage is not displayed #969

Open TTK95 opened 1 year ago

TTK95 commented 1 year ago

Describe the bug

grafik no usage is shown

To reproduce

just installed the clearml-agent

Expected behaviour

it should show some stats, right?

Environment

TTK95 commented 1 year ago

grafik

jkhenning commented 1 year ago

Hi @TTK95, how are you running the agent? Can you attach the agent's console log?

TTK95 commented 1 year ago

Hey @jkhenning

clearml-agent daemon --queue default 
Current configuration (clearml_agent v1.5.2, location: /home/tte/clearml.conf):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://localhost:8008
api.web_server = http://localhost:8080
api.files_server = http://localhost:8081
api.credentials.access_key = XXXXXXXXXXXXXX
api.host = http://localhost:8008
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = http://localhost:8081
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = 
agent.worker_name = pc-carl-8002
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/tte/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/tte/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/tte/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/tte/.clearml/pip-cache
agent.docker_apt_cache = /home/tte/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.git_user = XXX
agent.default_python = 3.7
agent.cuda_version = 117
agent.cudnn_version = 0

Worker "pc-carl-8002:0" - Listening to queues:
+----------------------------------+---------+-------+
| id                               | name    | tags  |
+----------------------------------+---------+-------+
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | default |       |
+----------------------------------+---------+-------+

Running CLEARML-AGENT daemon in background mode, writing stdout/stderr to /tmp/.clearml_agent_daemon_outmudg_3su.txt
jkhenning commented 1 year ago

Running CLEARML-AGENT daemon in background mode, writing stdout/stderr to /tmp/.clearml_agent_daemon_outmudg_3su.txt

Use --foreground to see the agent's output (and share, if possible)

TTK95 commented 1 year ago

@jkhenning sure, but it looks the same.

1681396720826 pc-carl-8002 info ClearML Task: overwriting (reusing) task id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ClearML results page: http://pc-carl-8002:8080/projects/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/experiments/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/output/log
1681396721713 pc-carl-8002 info 2023-04-13 16:38:41,712 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis
2023-04-13 16:38:41,915 - clearml.Task - INFO - Finished repository detection and package analysis
1681396737899 pc-carl-8002:0 INFO task xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx pulled from xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx by worker pc-carl-8002:0

1681396743164 pc-carl-8002:0 DEBUG Current configuration (clearml_agent v1.5.2, location: /tmp/.clearml_agent._i2jcjap.cfg):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://pc-carl-8002:8008
api.web_server = http://pc-carl-8002:8080
api.files_server = http://pc-carl-8002:8081
api.credentials.access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
api.host = http://pc-carl-8002:8008
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = http://pc-carl-8002:8081
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = pc-carl-8002:0
agent.worker_name = pc-carl-8002
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/tte/.clearml/venvs-builds
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/tte/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/tte/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/tte/.clearml/pip-cache
agent.docker_apt_cache = /home/tte/.clearml/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.git_user = XXXXXXXX
agent.default_python = 3.7
agent.cuda_version = 117
agent.cudnn_version = 0

Executing task id [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]:
repository = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
branch = main
version_num = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
tag = 
docker_cmd = 
entry_point = train.py
working_dir = .

created virtual environment CPython3.10.6.final.0-64 in 118ms
  creator CPython3Posix(dest=/home/tte/.clearml/venvs-builds/3.10, clear=False, no_vcs_ignore=False, global=True)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/tte/.local/share/virtualenv)
    added seed packages: pip==23.0.1, setuptools==67.6.1, wheel==0.40.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Using cached repository in "/home/tte/.clearml/vcs-cache/ISEAnet.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ISEAnet"
From xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   c6a6685..4bc5731  main       -> origin/main
Note: switching to 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 4bc5731 work on run.py
type: git
url: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
branch: HEAD
commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
root: /home/tte/.clearml/venvs-builds/3.10/task_repository/ISEAnet

Ignoring pip: markers 'python_version < "3.10"' don't match your environment
Collecting pip<22.3
  Using cached pip-22.2.2-py3-none-any.whl (2.0 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-22.2.2

1681396748072 pc-carl-8002:0 DEBUG Collecting Cython
  Using cached Cython-0.29.34-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.34
Collecting numpy==1.24.2
  Using cached numpy-1.24.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.5
    Not uninstalling numpy at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'numpy'. No files were found to uninstall.
Successfully installed numpy-1.24.2
Torch CUDA 117 index page found
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu117/
Requirement already satisfied: Pillow==9.0.1 in /usr/lib/python3/dist-packages (from -r /tmp/cached-reqsorsg7929.txt (line 1)) (9.0.1)
Collecting PyYAML==6.0
  Using cached PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (682 kB)

1681396753127 pc-carl-8002:0 DEBUG Collecting hydra_core==1.3.2
  Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Collecting kornia==0.6.11
  Using cached kornia-0.6.11-py2.py3-none-any.whl (628 kB)
Collecting lightning==2.0.0
  Using cached lightning-2.0.0-py3-none-any.whl (1.8 MB)
Collecting loguru==0.6.0
  Using cached loguru-0.6.0-py3-none-any.whl (58 kB)
Collecting matplotlib==3.7.1
  Using cached matplotlib-3.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
Requirement already satisfied: numpy==1.24.2 in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from -r /tmp/cached-reqsorsg7929.txt (line 9)) (1.24.2)
Collecting omegaconf==2.3.0
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting opencv_python==4.7.0.72
  Using cached opencv_python-4.7.0.72-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.8 MB)
Collecting tensorboard==2.12.1
  Using cached tensorboard-2.12.1-py3-none-any.whl (5.6 MB)
Collecting torch==2.0.0.*

1681396783270 pc-carl-8002:0 DEBUG   Using cached https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp310-cp310-linux_x86_64.whl (1843.9 MB)

1681396788334 pc-carl-8002:0 DEBUG Collecting torchinfo==1.7.2
  Using cached torchinfo-1.7.2-py3-none-any.whl (22 kB)
Collecting torchmetrics==0.11.4
  Using cached torchmetrics-0.11.4-py3-none-any.whl (519 kB)
Collecting torchvision==0.15.0.*
  Using cached https://download.pytorch.org/whl/cu117/torchvision-0.15.0%2Bcu117-cp310-cp310-linux_x86_64.whl (6.1 MB)

1681396793419 pc-carl-8002:0 DEBUG Collecting tqdm==4.65.0
  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting clearml==1.10.1
  Using cached clearml-1.10.1-py2.py3-none-any.whl (1.1 MB)
Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from hydra_core==1.3.2->-r /tmp/cached-reqsorsg7929.txt (line 4)) (21.3)
Collecting antlr4-python3-runtime==4.9.*
  Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
Collecting deepdiff<8.0,>=5.7.0
  Using cached deepdiff-6.3.0-py3-none-any.whl (69 kB)
Requirement already satisfied: requests<4.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (2.25.1)
Collecting uvicorn<2.0
  Using cached uvicorn-0.21.1-py3-none-any.whl (57 kB)
Collecting websockets<12.0
  Using cached websockets-11.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)
Collecting starlette<2.0
  Using cached starlette-0.26.1-py3-none-any.whl (66 kB)
Collecting starsessions<2.0,>=1.2.1
  Using cached starsessions-1.3.0-py3-none-any.whl (10 kB)
Collecting traitlets<7.0,>=5.3.0
  Using cached traitlets-5.9.0-py3-none-any.whl (117 kB)
Collecting lightning-utilities<2.0,>=0.7.0
  Using cached lightning_utilities-0.8.0-py3-none-any.whl (20 kB)
Collecting websocket-client<3.0
  Using cached websocket_client-1.5.1-py3-none-any.whl (55 kB)

1681396798481 pc-carl-8002:0 DEBUG Collecting dateutils<2.0
  Using cached dateutils-0.6.12-py2.py3-none-any.whl (5.7 kB)
Requirement already satisfied: Jinja2<5.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (3.0.3)
Collecting inquirer<5.0,>=2.10.0
  Using cached inquirer-3.1.3-py3-none-any.whl (18 kB)
Requirement already satisfied: urllib3<3.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (1.26.5)
Collecting fastapi<0.89.0
  Using cached fastapi-0.88.0-py3-none-any.whl (55 kB)
Collecting pytorch-lightning
  Using cached pytorch_lightning-2.0.1.post0-py3-none-any.whl (718 kB)
Requirement already satisfied: psutil<7.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (5.9.0)
Collecting arrow<3.0,>=1.2.0
  Using cached arrow-1.2.3-py3-none-any.whl (66 kB)
Collecting croniter<1.4.0,>=1.3.0
  Using cached croniter-1.3.14-py2.py3-none-any.whl (18 kB)
Collecting fsspec[http]<2025.0,>2021.06.0
  Using cached fsspec-2023.4.0-py3-none-any.whl (153 kB)
Requirement already satisfied: click<10.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (8.0.3)
Collecting rich<15.0,>=12.3.0
  Using cached rich-13.3.4-py3-none-any.whl (238 kB)
Collecting typing-extensions<6.0,>=4.0.0
  Using cached typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting pydantic<3.0
  Using cached pydantic-1.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting lightning-cloud>=0.5.31
  Using cached lightning_cloud-0.5.33-py3-none-any.whl (553 kB)
Requirement already satisfied: beautifulsoup4<6.0,>=4.8.0 in /usr/lib/python3/dist-packages (from lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (4.10.0)
Collecting contourpy>=1.0.1
  Using cached contourpy-1.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (300 kB)
Requirement already satisfied: python-dateutil>=2.7 in /usr/lib/python3/dist-packages (from matplotlib==3.7.1->-r /tmp/cached-reqsorsg7929.txt (line 8)) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/lib/python3/dist-packages (from matplotlib==3.7.1->-r /tmp/cached-reqsorsg7929.txt (line 8)) (1.3.2)
Requirement already satisfied: fonttools>=4.22.0 in /usr/lib/python3/dist-packages (from matplotlib==3.7.1->-r /tmp/cached-reqsorsg7929.txt (line 8)) (4.29.1)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/lib/python3/dist-packages (from matplotlib==3.7.1->-r /tmp/cached-reqsorsg7929.txt (line 8)) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/lib/python3/dist-packages (from matplotlib==3.7.1->-r /tmp/cached-reqsorsg7929.txt (line 8)) (0.11.0)

1681396803544 pc-carl-8002:0 DEBUG Collecting protobuf>=3.19.6
  Using cached protobuf-4.22.3-cp37-abi3-manylinux2014_x86_64.whl (302 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0
  Using cached tensorboard_data_server-0.7.0-py3-none-manylinux2014_x86_64.whl (6.6 MB)
Collecting werkzeug>=1.0.1
  Using cached Werkzeug-2.2.3-py3-none-any.whl (233 kB)
Requirement already satisfied: setuptools>=41.0.0 in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (67.6.1)
Requirement already satisfied: wheel>=0.26 in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (0.40.0)
Collecting google-auth<3,>=1.6.3
  Using cached google_auth-2.17.3-py2.py3-none-any.whl (178 kB)
Collecting tensorboard-plugin-wit>=1.6.0
  Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
Collecting grpcio>=1.48.2
  Using cached grpcio-1.53.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB)
Collecting google-auth-oauthlib<1.1,>=0.5
  Using cached google_auth_oauthlib-1.0.0-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: markdown>=2.6.8 in /usr/lib/python3/dist-packages (from tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (3.3.6)
Collecting absl-py>=0.4
  Using cached absl_py-1.4.0-py3-none-any.whl (126 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0.*->-r /tmp/cached-reqsorsg7929.txt (line 14)) (3.10.7)
Collecting networkx
  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting triton==2.0.0

1681396808599 pc-carl-8002:0 DEBUG   Using cached https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch==2.0.0.*->-r /tmp/cached-reqsorsg7929.txt (line 14)) (1.9)
Requirement already satisfied: jsonschema>=2.6.0 in /usr/lib/python3/dist-packages (from clearml==1.10.1->-r /tmp/cached-reqsorsg7929.txt (line 19)) (3.2.0)
Collecting pathlib2>=2.3.0
  Using cached pathlib2-2.3.7.post1-py2.py3-none-any.whl (18 kB)
Collecting pyjwt<2.5.0,>=2.4.0
  Using cached PyJWT-2.4.0-py3-none-any.whl (18 kB)
Requirement already satisfied: six>=1.13.0 in /usr/lib/python3/dist-packages (from clearml==1.10.1->-r /tmp/cached-reqsorsg7929.txt (line 19)) (1.16.0)
Collecting furl>=2.0.0
  Using cached furl-2.1.3-py2.py3-none-any.whl (20 kB)
Requirement already satisfied: attrs>=18.0 in /usr/lib/python3/dist-packages (from clearml==1.10.1->-r /tmp/cached-reqsorsg7929.txt (line 19)) (21.2.0)
Collecting lit
  Using cached lit-16.0.1-py3-none-any.whl
Collecting cmake
  Using cached cmake-3.26.3-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
Requirement already satisfied: pytz in /usr/lib/python3/dist-packages (from dateutils<2.0->lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (2022.1)
Collecting ordered-set<4.2.0,>=4.0.2
  Using cached ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Collecting starlette<2.0
  Using cached starlette-0.22.0-py3-none-any.whl (64 kB)
Collecting anyio<5,>=3.4.0
  Using cached anyio-3.6.2-py3-none-any.whl (80 kB)

1681396813650 pc-carl-8002:0 DEBUG Collecting aiohttp!=4.0.0a0,!=4.0.0a1
  Using cached aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting orderedmultidict>=1.0.1
  Using cached orderedmultidict-1.0.1-py2.py3-none-any.whl (11 kB)
Collecting cachetools<6.0,>=2.0.0
  Using cached cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting rsa<5,>=3.1.4
  Using cached rsa-4.9-py3-none-any.whl (34 kB)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/lib/python3/dist-packages (from google-auth<3,>=1.6.3->tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (0.2.1)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting blessed>=1.19.0
  Using cached blessed-1.20.0-py2.py3-none-any.whl (58 kB)
Collecting readchar>=3.0.6
  Using cached readchar-4.0.5-py3-none-any.whl (8.5 kB)
Collecting python-editor>=1.0.4
  Using cached python_editor-1.0.4-py3-none-any.whl (4.9 kB)
Collecting python-multipart
  Using cached python_multipart-0.0.6-py3-none-any.whl (45 kB)

1681396818702 pc-carl-8002:0 DEBUG Collecting markdown-it-py<3.0.0,>=2.2.0
  Using cached markdown_it_py-2.2.0-py3-none-any.whl (84 kB)
Collecting pygments<3.0.0,>=2.13.0
  Using cached Pygments-2.15.0-py3-none-any.whl (1.1 MB)
Collecting itsdangerous<3.0.0,>=2.0.1
  Using cached itsdangerous-2.1.2-py3-none-any.whl (15 kB)
Collecting h11>=0.8
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting MarkupSafe>=2.1.1
  Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting yarl<2.0,>=1.0
  Using cached yarl-1.8.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (264 kB)
Collecting async-timeout<5.0,>=4.0.0a3
  Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting aiosignal>=1.1.2
  Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting multidict<7.0,>=4.5
  Using cached multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)

1681396823776 pc-carl-8002:0 DEBUG Collecting charset-normalizer<4.0,>=2.0
  Using cached charset_normalizer-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
Collecting frozenlist>=1.1.1
  Using cached frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)
Requirement already satisfied: idna>=2.8 in /usr/lib/python3/dist-packages (from anyio<5,>=3.4.0->starlette<2.0->lightning==2.0.0->-r /tmp/cached-reqsorsg7929.txt (line 6)) (3.3)
Collecting sniffio>=1.1
  Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting wcwidth>=0.1.4
  Using cached wcwidth-0.2.6-py2.py3-none-any.whl (29 kB)
Collecting mdurl~=0.1
  Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/lib/python3/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (3.2.0)
Requirement already satisfied: pyasn1>=0.1.3 in /usr/lib/python3/dist-packages (from rsa<5,>=3.1.4->google-auth<3,>=1.6.3->tensorboard==2.12.1->-r /tmp/cached-reqsorsg7929.txt (line 13)) (0.4.8)
Installing collected packages: wcwidth, tensorboard-plugin-wit, python-editor, lit, cmake, antlr4-python3-runtime, websockets, websocket-client, typing-extensions, traitlets, tqdm, torchinfo, tensorboard-data-server, sniffio, rsa, requests-oauthlib, readchar, PyYAML, python-multipart, pyjwt, pygments, protobuf, pathlib2, orderedmultidict, ordered-set, opencv_python, networkx, multidict, mdurl, MarkupSafe, loguru, itsdangerous, h11, grpcio, fsspec, frozenlist, dateutils, croniter, contourpy, charset-normalizer, cachetools, blessed, async-timeout, arrow, absl-py, yarl, werkzeug, uvicorn, pydantic, omegaconf, matplotlib, markdown-it-py, lightning-utilities, inquirer, google-auth, furl, deepdiff, anyio, aiosignal, starlette, rich, hydra_core, google-auth-oauthlib, clearml, aiohttp, tensorboard, starsessions, fastapi, lightning-cloud, triton, torch, torchmetrics, pytorch-lightning, torchvision, lightning, kornia
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 5.4.1
    Not uninstalling pyyaml at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'PyYAML'. No files were found to uninstall.
  Attempting uninstall: pyjwt
    Found existing installation: PyJWT 2.3.0
    Not uninstalling pyjwt at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'PyJWT'. No files were found to uninstall.
  Attempting uninstall: pygments
    Found existing installation: Pygments 2.11.2
    Not uninstalling pygments at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'Pygments'. No files were found to uninstall.
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.12.4
    Not uninstalling protobuf at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'protobuf'. No files were found to uninstall.

1681396828810 pc-carl-8002:0 DEBUG   Attempting uninstall: MarkupSafe
    Found existing installation: MarkupSafe 2.0.1
    Not uninstalling markupsafe at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'MarkupSafe'. No files were found to uninstall.
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.5.1
    Not uninstalling matplotlib at /usr/lib/python3/dist-packages, outside environment /home/tte/.clearml/venvs-builds/3.10
    Can't uninstall 'matplotlib'. No files were found to uninstall.

1681396858982 pc-carl-8002:0 DEBUG Successfully installed MarkupSafe-2.1.2 PyYAML-6.0 absl-py-1.4.0 aiohttp-3.8.4 aiosignal-1.3.1 antlr4-python3-runtime-4.9.3 anyio-3.6.2 arrow-1.2.3 async-timeout-4.0.2 blessed-1.20.0 cachetools-5.3.0 charset-normalizer-3.1.0 clearml-1.10.1 cmake-3.26.3 contourpy-1.0.7 croniter-1.3.14 dateutils-0.6.12 deepdiff-6.3.0 fastapi-0.88.0 frozenlist-1.3.3 fsspec-2023.4.0 furl-2.1.3 google-auth-2.17.3 google-auth-oauthlib-1.0.0 grpcio-1.53.0 h11-0.14.0 hydra_core-1.3.2 inquirer-3.1.3 itsdangerous-2.1.2 kornia-0.6.11 lightning-2.0.0 lightning-cloud-0.5.33 lightning-utilities-0.8.0 lit-16.0.1 loguru-0.6.0 markdown-it-py-2.2.0 matplotlib-3.7.1 mdurl-0.1.2 multidict-6.0.4 networkx-3.1 omegaconf-2.3.0 opencv_python-4.7.0.72 ordered-set-4.1.0 orderedmultidict-1.0.1 pathlib2-2.3.7.post1 protobuf-4.22.3 pydantic-1.10.7 pygments-2.15.0 pyjwt-2.4.0 python-editor-1.0.4 python-multipart-0.0.6 pytorch-lightning-2.0.1.post0 readchar-4.0.5 requests-oauthlib-1.3.1 rich-13.3.4 rsa-4.9 sniffio-1.3.0 starlette-0.22.0 starsessions-1.3.0 tensorboard-2.12.1 tensorboard-data-server-0.7.0 tensorboard-plugin-wit-1.8.1 torch-2.0.0+cu117 torchinfo-1.7.2 torchmetrics-0.11.4 torchvision-0.15.0+cu117 tqdm-4.65.0 traitlets-5.9.0 triton-2.0.0 typing-extensions-4.5.0 uvicorn-0.21.1 wcwidth-0.2.6 websocket-client-1.5.1 websockets-11.0.1 werkzeug-2.2.3 yarl-1.8.2
Replacing original pip vcs 'git+https://github.com/optuna/optuna.git@0a8fa708e160524a57afef4e3a834288d9eee00f#egg=optuna' with 'git+https://ISEA-TTE:xxxxxx@github.com/optuna/optuna.git@0a8fa708e160524a57afef4e3a834288d9eee00f#egg=optuna'
Collecting optuna
  Using cached optuna-3.2.0.dev0-py3-none-any.whl
Collecting sqlalchemy>=1.3.0
  Using cached SQLAlchemy-2.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
Requirement already satisfied: PyYAML in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from optuna) (6.0)
Collecting colorlog
  Using cached colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: numpy in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from optuna) (1.24.2)
Collecting cmaes>=0.9.1
  Using cached cmaes-0.9.1-py3-none-any.whl (21 kB)
Requirement already satisfied: tqdm in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from optuna) (4.65.0)
Collecting alembic>=1.5.0
  Using cached alembic-1.10.3-py3-none-any.whl (212 kB)
Requirement already satisfied: packaging>=20.0 in /usr/lib/python3/dist-packages (from optuna) (21.3)
Collecting Mako
  Using cached Mako-1.2.4-py3-none-any.whl (78 kB)
Requirement already satisfied: typing-extensions>=4 in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from alembic>=1.5.0->optuna) (4.5.0)
Collecting greenlet!=0.4.17
  Using cached greenlet-2.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (613 kB)
Requirement already satisfied: MarkupSafe>=0.9.2 in /home/tte/.clearml/venvs-builds/3.10/lib/python3.10/site-packages (from Mako->alembic>=1.5.0->optuna) (2.1.2)
Installing collected packages: Mako, greenlet, colorlog, cmaes, sqlalchemy, alembic, optuna

1681396864284 pc-carl-8002:0 DEBUG Successfully installed Mako-1.2.4 alembic-1.10.3 cmaes-0.9.1 colorlog-6.7.0 greenlet-2.0.2 optuna-3.2.0.dev0 sqlalchemy-2.0.9
Adding venv into cache: /home/tte/.clearml/venvs-builds/3.10
Running task id [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]:
[.]$ /home/tte/.clearml/venvs-builds/3.10/bin/python -u train.py
Summary - installed python packages:
pip:
- absl-py==1.4.0
- aiohttp==3.8.4
- aiosignal==1.3.1
- alembic==1.10.3
- antlr4-python3-runtime==4.9.3
- anyio==3.6.2
- appdirs==1.4.4
- apturl==0.5.2
- arrow==1.2.3
- async-timeout==4.0.2
- attrs==21.2.0
- Automat==20.2.0
- Babel==2.8.0
- bcrypt==3.2.0
- beautifulsoup4==4.10.0
- beniget==0.4.1
- blessed==1.20.0
- blinker==1.4
- Brlapi==0.8.3
- Brotli==1.0.9
- cachetools==5.3.0
- certifi==2020.6.20
- chardet==4.0.0
- charset-normalizer==3.1.0
- clearml==1.10.1
- click==8.0.3
- cloud-init==23.1.1
- cmaes==0.9.1
- cmake==3.26.3
- colorama==0.4.4
- colorlog==6.7.0
- command-not-found==0.3
- configobj==5.0.6
- constantly==15.1.0
- contourpy==1.0.7
- croniter==1.3.14
- cryptography==3.4.8
- cupshelpers==1.0
- cycler==0.11.0
- Cython==0.29.34
- dateutils==0.6.12
- dbus-python==1.2.18
- decorator==4.4.2
- deepdiff==6.3.0
- defer==1.0.6
- distlib==0.3.6
- distro==1.7.0
- distro-info===1.1build1
- dnspython==2.1.0
- fastapi==0.88.0
- filelock==3.10.7
- fonttools==4.29.1
- frozenlist==1.3.3
- fs==2.4.12
- fsspec==2023.4.0
- furl==2.1.3
- gast==0.5.2
- google-auth==2.17.3
- google-auth-oauthlib==1.0.0
- gpg===1.16.0-unknown
- greenlet==2.0.2
- grpcio==1.53.0
- h11==0.14.0
- html5lib==1.1
- httplib2==0.20.2
- hydra-core==1.3.2
- hyperlink==21.0.0
- idna==3.3
- imgviz==1.4.1
- importlib-metadata==4.6.4
- incremental==21.3.0
- inquirer==3.1.3
- itsdangerous==2.1.2
- jeepney==0.7.1
- Jinja2==3.0.3
- jsonpatch==1.32
- jsonpointer==2.0
- jsonschema==3.2.0
- keyring==23.5.0
- kiwisolver==1.3.2
- kornia==0.6.11
- labelme==4.6.0
- language-selector==0.1
- launchpadlib==1.10.16
- lazr.restfulclient==0.14.4
- lazr.uri==1.0.6
- lightning==2.0.0
- lightning-cloud==0.5.33
- lightning-utilities==0.8.0
- lit==16.0.1
- loguru==0.6.0
- louis==3.20.0
- lxml==4.8.0
- lz4==3.1.3+dfsg
- macaroonbakery==1.3.1
- Mako==1.2.4
- Markdown==3.3.6
- markdown-it-py==2.2.0
- MarkupSafe==2.1.2
- matplotlib==3.7.1
- mdurl==0.1.2
- mock==4.0.3
- more-itertools==8.10.0
- mpmath==0.0.0
- multidict==6.0.4
- netifaces==0.11.0
- networkx==3.1
- numpy==1.24.2
- oauthlib==3.2.0
- olefile==0.46
- omegaconf==2.3.0
- opencv-python==4.7.0.72
- optuna @ git+https://github.com/optuna/optuna.git@0a8fa708e160524a57afef4e3a834288d9eee00f
- ordered-set==4.1.0
- orderedmultidict==1.0.1
- packaging==21.3
- pathlib2==2.3.7.post1
- pbr==5.8.0
- pexpect==4.8.0
- Pillow==9.0.1
- platformdirs==3.2.0
- ply==3.11
- protobuf==4.22.3
- psutil==5.9.0
- ptyprocess==0.7.0
- pyasn1==0.4.8
- pyasn1-modules==0.2.1
- pycairo==1.20.1
- pycups==2.0.1
- pydantic==1.10.7
- Pygments==2.15.0
- PyGObject==3.42.1
- PyHamcrest==2.0.2
- PyJWT==2.4.0
- pymacaroons==0.13.0
- PyNaCl==1.5.0
- pyOpenSSL==21.0.0
- pyparsing==2.4.7
- PyQt5==5.15.6
- PyQt5-sip==12.9.1
- PyQtWebEngine==5.15.5
- pyRFC3339==1.1
- pyrsistent==0.18.1
- pyserial==3.5
- python-apt==2.4.0+ubuntu1
- python-dateutil==2.8.1
- python-debian===0.1.43ubuntu1
- python-editor==1.0.4
- python-magic==0.4.24
- python-multipart==0.0.6
- pythran==0.10.0
- pytorch-lightning==2.0.1.post0
- pytz==2022.1
- pyxdg==0.27
- PyYAML==6.0
- QtPy==2.0.0
- readchar==4.0.5
- reportlab==3.6.8
- requests==2.25.1
- requests-oauthlib==1.3.1
- requests-toolbelt==0.9.1
- rich==13.3.4
- rsa==4.9
- scipy==1.8.0
- screen-resolution-extra==0.0.0
- SecretStorage==3.3.1
- service-identity==18.1.0
- six==1.16.0
- sniffio==1.3.0
- sos==4.4
- soupsieve==2.3.1
- SQLAlchemy==2.0.9
- ssh-import-id==5.11
- starlette==0.22.0
- starsessions==1.3.0
- sympy==1.9
- systemd-python==234
- tensorboard==2.12.1
- tensorboard-data-server==0.7.0
- tensorboard-plugin-wit==1.8.1
- termcolor==1.1.0
- terminator==2.1.1
- torch==2.0.0+cu117
- torchinfo==1.7.2
- torchmetrics==0.11.4
- torchvision==0.15.0+cu117
- tqdm==4.65.0
- traitlets==5.9.0
- triton==2.0.0
- Twisted==22.1.0
- typing_extensions==4.5.0
- ubuntu-advantage-tools==8001
- ubuntu-drivers-common==0.0.0
- ufoLib2==0.13.1
- ufw==0.36.1
- unattended-upgrades==0.1
- unicodedata2==14.0.0
- urllib3==1.26.5
- uvicorn==0.21.1
- virtualenv==20.21.0
- wadllib==1.3.6
- wcwidth==0.2.6
- webencodings==0.5.1
- websocket-client==1.5.1
- websockets==11.0.1
- Werkzeug==2.2.3
- xdg==5
- xkit==0.0.0
- yarl==1.8.2
- zipp==1.0.0
- zope.interface==5.4.0

Environment setup completed successfully

Starting Task Execution:
TTK95 commented 1 year ago

I mean, queues working fine: grafik

jkhenning commented 1 year ago

The last log you shared looks like you're running the task locally, not using an agent?

ClearML Task: overwriting (reusing) task id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

TTK95 commented 1 year ago

@jkhenning yeah it doesn't matter if I reuse the task ID, either way it should display my worker load, right? :D It works all fine, it just doesn't display the worker load.. in the task it tracks the machine stats.

jkhenning commented 1 year ago

The workers graphs rely on agents running your tasks and reporting their status - if you run your tasks locally (not using agents_, these won't show

TTK95 commented 1 year ago

@jkhenning I Know, it is shown in the queue of the worker, it runs on the worker, it just does not show anything

jkhenning commented 1 year ago

Can you verify that psutil is installed in the python environment where the agent is installed?

TTK95 commented 1 year ago

@jkhenning

pip list
Package                            Version
---------------------------------- ---------
alabaster                          0.7.11
anaconda-client                    1.7.2
anaconda-navigator                 1.9.2
anaconda-project                   0.8.2
appdirs                            1.4.3
asn1crypto                         0.24.0
astroid                            2.0.4
astropy                            3.0.4
atomicwrites                       1.2.1
attrs                              18.2.0
Automat                            0.7.0
Babel                              2.6.0
backcall                           0.1.0
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4                     4.6.3
bitarray                           0.8.3
bkcharts                           0.2
blaze                              0.11.3
bleach                             2.1.4
bokeh                              0.13.0
boto                               2.49.0
Bottleneck                         1.2.1
certifi                            2018.8.24
cffi                               1.11.5
chardet                            3.0.4
charset-normalizer                 3.1.0
clearml                            1.10.1
clearml-agent                      1.5.2
click                              6.7
cloudpickle                        0.5.5
clyent                             1.2.2
colorama                           0.3.9
conda                              4.5.11
conda-build                        3.15.1
constantly                         15.1.0
contextlib2                        0.5.5
cryptography                       2.3.1
cycler                             0.10.0
Cython                             0.28.5
cytoolz                            0.9.0.1
dask                               0.19.1
datashape                          0.5.4
decorator                          4.3.0
defusedxml                         0.5.0
distlib                            0.3.6
distributed                        1.23.1
docutils                           0.14
entrypoints                        0.2.3
et-xmlfile                         1.0.1
fastcache                          1.0.2
filelock                           3.10.7
Flask                              1.0.2
Flask-Cors                         3.0.6
furl                               2.1.3
gevent                             1.3.6
glob2                              0.6
gmpy2                              2.0.8
greenlet                           0.4.15
h5py                               2.8.0
heapdict                           1.0.0
html5lib                           1.0.1
hyperlink                          18.0.0
idna                               2.7
imageio                            2.4.1
imagesize                          1.1.0
importlib-metadata                 6.1.0
importlib-resources                5.12.0
incremental                        17.5.0
ipykernel                          4.9.0
ipython                            6.5.0
ipython_genutils                   0.2.0
ipywidgets                         7.4.1
isort                              4.3.4
itsdangerous                       0.24
jdcal                              1.4
jedi                               0.12.1
jeepney                            0.3.1
Jinja2                             2.10
jsonschema                         4.17.3
jupyter                            1.0.0
jupyter-client                     5.2.3
jupyter-console                    5.2.0
jupyter-core                       4.4.0
jupyterlab                         0.34.9
jupyterlab-launcher                0.13.1
keyring                            13.2.1
kiwisolver                         1.0.1
lazy-object-proxy                  1.3.1
llvmlite                           0.24.0
locket                             0.2.0
lxml                               4.2.5
MarkupSafe                         1.0
matplotlib                         2.2.3
mccabe                             0.6.1
mistune                            0.8.3
mkl-fft                            1.0.4
mkl-random                         1.0.1
more-itertools                     4.3.0
mpmath                             1.0.0
msgpack                            0.5.6
multipledispatch                   0.6.0
navigator-updater                  0.2.1
nbconvert                          5.4.0
nbformat                           4.4.0
networkx                           2.1
nltk                               3.3
nose                               1.3.7
notebook                           5.6.0
numba                              0.39.0
numexpr                            2.6.8
numpy                              1.15.1
numpydoc                           0.8.0
odo                                0.5.1
olefile                            0.46
openpyxl                           2.5.6
orderedmultidict                   1.0.1
packaging                          17.1
pandas                             0.23.4
pandocfilters                      1.4.2
parso                              0.3.1
partd                              0.3.8
path.py                            11.1.0
pathlib2                           2.3.2
patsy                              0.5.0
pep8                               1.7.1
pexpect                            4.6.0
pickleshare                        0.7.4
Pillow                             5.2.0
pip                                23.0.1
pkginfo                            1.4.2
pkgutil_resolve_name               1.3.10
platformdirs                       3.2.0
pluggy                             0.7.1
ply                                3.11
prometheus-client                  0.3.1
prompt-toolkit                     1.0.15
psutil                             5.4.7
ptyprocess                         0.6.0
py                                 1.6.0
pyasn1                             0.4.4
pyasn1-modules                     0.2.2
pycodestyle                        2.4.0
pycosat                            0.6.3
pycparser                          2.18
pycrypto                           2.6.1
pycurl                             7.43.0.2
pyflakes                           2.0.0
Pygments                           2.2.0
PyJWT                              2.4.0
pylint                             2.1.1
pyodbc                             4.0.24
pyOpenSSL                          18.0.0
pyparsing                          2.2.0
pyrsistent                         0.19.3
PySocks                            1.6.8
pytest                             3.8.0
pytest-arraydiff                   0.2
pytest-astropy                     0.4.0
pytest-doctestplus                 0.1.3
pytest-openfiles                   0.3.0
pytest-remotedata                  0.3.0
python-dateutil                    2.7.3
pytz                               2018.5
PyWavelets                         1.0.0
PyYAML                             3.13
pyzmq                              17.1.2
QtAwesome                          0.4.4
qtconsole                          4.4.1
QtPy                               1.5.0
requests                           2.28.2
rope                               0.11.0
ruamel_yaml                        0.15.46
scikit-image                       0.14.0
scikit-learn                       0.19.2
scipy                              1.1.0
seaborn                            0.9.0
SecretStorage                      3.1.0
Send2Trash                         1.5.0
service-identity                   17.0.0
setuptools                         40.2.0
simplegeneric                      0.8.1
singledispatch                     3.4.0.3
six                                1.16.0
snowballstemmer                    1.2.1
sortedcollections                  1.0.1
sortedcontainers                   2.0.5
Sphinx                             1.7.9
sphinxcontrib-websupport           1.1.0
spyder                             3.3.1
spyder-kernels                     0.2.6
SQLAlchemy                         1.2.11
statsmodels                        0.9.0
sympy                              1.2
tables                             3.4.4
tblib                              1.3.2
terminado                          0.8.1
testpath                           0.3.1
toolz                              0.9.0
tornado                            5.1
tqdm                               4.26.0
traitlets                          4.3.2
Twisted                            18.7.0
typing_extensions                  4.5.0
unicodecsv                         0.14.1
urllib3                            1.23
virtualenv                         20.21.0
wcwidth                            0.1.7
webencodings                       0.5.1
Werkzeug                           0.14.1
wheel                              0.31.1
widgetsnbextension                 3.4.1
wrapt                              1.10.11
xlrd                               1.1.0
XlsxWriter                         1.1.0
xlwt                               1.3.0
zict                               0.1.3
zipp                               3.15.0
zope.interface                     4.5.0
jkhenning commented 1 year ago

That's really strange... Can you try passing -d to the agent to try and get more info?

jkhenning commented 1 year ago

@TTK95 can you please try installing clearml-agent v1.5.3rc2 ? I've added some more protection in that area, perhaps we'll see some more information

TTK95 commented 1 year ago

clearml_agent_daemon_outwprddsim.txt @jkhenning Still no stats...

jkhenning commented 1 year ago

Can you also attach the agent's log? IF you run the agent with --foreground, it should be outputted to the console where you ran the agent

TTK95 commented 1 year ago
clearml-agent daemon --queue default --foreground -d
Current configuration (clearml_agent v1.5.3rc2, location: /home/tte/clearml.conf):
----------------------
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://pc-carl-8002:8008
api.web_server = http://pc-carl-8002:8080
api.files_server = http://pc-carl-8002:8081
api.credentials.access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
api.host = http://pc-carl-8002:8008
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key = 
sdk.aws.s3.region = 
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri = http://pc-carl-8002:8081
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
agent.worker_id = 
agent.worker_name = pc-carl-8002
agent.force_git_ssh_protocol = false
agent.python_binary = 
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = <20.2 ; python_version < '3.10'
agent.package_manager.pip_version.1 = <22.3 ; python_version >\= '3.10'
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.priority_optional_packages.0 = pygobject
agent.package_manager.torch_nightly = false
agent.package_manager.poetry_files_from_repo_working_dir = false
agent.venvs_dir = /home/tte/.clearml/venvs-builds.1
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.venvs_cache.path = ~/.clearml/venvs-cache
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/tte/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/tte/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/tte/.clearml/pip-cache
agent.docker_apt_cache = /home/tte/.clearml/apt-cache.1
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.abort_callback_max_timeout = 1800
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = ~/.ssh
agent.docker_internal_mounts.ssh_ro_folder = /.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = ~/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script = 
agent.disable_task_docker_override = false
agent.git_user = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
agent.default_python = 3.7
agent.cuda_version = 117
agent.cudnn_version = 0

Worker "pc-carl-8002:1" - Listening to queues:
+----------------------------------+---------+-------+
| id                               | name    | tags  |
+----------------------------------+---------+-------+
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | default |       |
+----------------------------------+---------+-------+

No tasks in queue xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
No tasks in Queues, sleeping for 5.0 seconds

@jkhenning

            .-/+oossssoo+/-.               tte@pc-carl-8002 
        `:+ssssssssssssssssss+:`           ---------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.2 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: Precision 5820 Tower 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.15.0-70-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 13 days, 1 hour, 43 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 2038 (dpkg), 13 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.1.16 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   DE: GNOME 42.5 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM: Mutter 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   WM Theme: Adwaita 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Yaru-dark [GTK2/3] 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Icons: Yaru [GTK2/3] 
  +sssssssssdmydMMMMMMMMddddyssssssss+     Terminal: x-terminal-emul 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      CPU: Intel Xeon W-2295 (36) @ 4.600G 
    .ossssssssssssssssssdMMMNysssso.       GPU: NVIDIA RTX A6000 
      -+sssssssssssssssssyyyssss+-         Memory: 29978MiB / 257430MiB 
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.