allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.43k stars 643 forks source link

task.connect() behavior when running on remote #1222

Closed ilouzl closed 3 months ago

ilouzl commented 4 months ago

Describe the bug

When a task is running remotely, a connected object cannot be updated

To reproduce

  1. Use the example from https://github.com/allegroai/clearml/blob/master/examples/reporting/hyper_parameters.py.
  2. Modify it to execute remotely as following
    
    # ClearML - example code for logging into "CONFIGURATION":
    # - ArgumentParser parameter logging
    # - user properties logging
    # - logging of hyperparameters via dictionary
    # - logging of hyperparameters via TaskParameters
    # - logging of configuration objects via TaskParameters
    #
    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function

import sys from argparse import ArgumentParser from enum import Enum

from clearml import Task from clearml.task_parameters import TaskParameters, param, percent_param

Connecting ClearML with the current process,

from here on everything is logged automatically

Task.add_requirements('clearml') task = Task.init(project_name='FirstTrial', task_name='first_trial') task.execute_remotely(queue_name='default')

-----------------------------------------------

Report user properties

-----------------------------------------------

task.set_user_properties(custom1='great', custom2=True) task.set_user_properties(custom3=1, custom4=2.0)

-----------------------------------------------

Report hyperparameters via dictionary

-----------------------------------------------

class StringEnumClass(Enum): A = 'a' B = 'b'

class IntEnumClass(Enum): C = 1 D = 2

parameters = { 'list': [1, 2, 3], 'dict': {'a': 1, 'b': 2}, 'tuple': (1, 2, 3), 'int': 3, 'float': 2.2, 'string': 'my string', 'IntEnumParam': StringEnumClass.A, 'StringEnumParam': IntEnumClass.C } parameters = task.connect(parameters)

adding new parameter after connect (will be logged as well)

parameters['new_param'] = 'this is new'

changing the value of a parameter (new value will be stored instead of previous one)

parameters['float'] = '9.9' print(parameters)



## Expected behaviour
The value of `parameters['float']` should change to '9.9' bit it actually stays 2.2

## Environment
* Server type - app.clear.ml
* ClearML SDK Version - clearml==1.14.4
* Python Version - 3.9.16
* Dockerized worker
## Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1709474983437769
ainoam commented 3 months ago

@ilouzl This is actually the intended behaviour. ClearML is designed to both log all configuration, and instrument your code such that you can later override your initial configuration when remotely executing. To that end, parameter values set in the server take precedence when executed by a ClearML agent (unless explicitly cancelled using Task.connect()'s ignore_remote_overrides parameter).

Consider any connected parameters as representing the initial configuration - As such, you cannot modify them at runtime as that would be like changing your logged initial conditions after the fact, which would be inconsistent

For a use case where you want to effect another level of override programmatically - You should use an explicit variable such that it will get its value from either the connected configuration, or whatever logic you implement.

Does this make sense?

ilouzl commented 3 months ago

Yes. Thanks for the detailed clarification.