allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.68k stars 654 forks source link

Progress report fails with exception TypeError #1036

Open Maksim-Vatkin opened 1 year ago

Maksim-Vatkin commented 1 year ago

Describe the bug

Looks like server tries to calculate number of complited tasks in _report_completed_status and gets an exception After that update process stops.

Exception in thread Thread-10 (_report_daemon): Traceback (most recent call last): File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/site-packages/clearml/automation/optimization.py", line 1878, in _report_completed_status values = [float(v) for v in col[1:]] File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/site-packages/clearml/automation/optimization.py", line 1878, in values = [float(v) for v in col[1:]] TypeError: float() argument must be a string or a real number, not 'list' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/threading.py", line 1009, in _bootstrap_inner self.run() File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/threading.py", line 946, in run self._target(*self._args, **self._kwargs) File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/site-packages/clearml/automation/optimization.py", line 1766, in _report_daemon self._report_completed_status(completed_jobs, cur_completed_jobs, task_logger, title) File "/home/mvatkin/projects/ai-doc-analyst/ai_doc_analyst/lib/python3.10/site-packages/clearml/automation/optimization.py", line 1883, in _report_completed_status unique_ticks = list(set(ticks)) TypeError: unhashable type: 'list'

To reproduce

I have the following setup.

  1. Server on machine No1
  2. 4 Agents on machine No2
  3. an_optimizer = HyperParameterOptimizer(

    This is the experiment we want to optimize

    base_task_id=template_task_id, hyper_parameters= hyper_parameters, objective_metric_title='Summary', objective_metric_series='train_auc', objective_metric_sign='max', max_number_of_concurrent_tasks=4, optimizer_class=GridSearch, execution_queue='default', pool_period_min=0.1, auto_connect_task=True, # Store optimization arguments and configuration in the Task save_top_k_tasks_only=5, always_create_task=True, )

Expected behaviour

No exception

Environment

jkhenning commented 1 year ago

Hi @Maksim-Vatkin , what is the clearml SDK version you're using?

Maksim-Vatkin commented 1 year ago
  • 1.10.0

1.10.0