allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.68k stars 654 forks source link

hydra error on remote: ... appears more than once in the final defaults list #1095

Open mctigger opened 1 year ago

mctigger commented 1 year ago

Describe the bug

My run fails on remote with ... appears more than once in the final defaults list

To reproduce

Running my script with command line arguments that override existing args, e.g. python my_script.py +execution=remote execution.queue="A100"

Expected behaviour

What is the expected behaviour? What should've happened but didn't?

Environment

Python 3.10 clearml==1.12.2 hydra-core==1.3.2 WebApp: 1.10.0-357 • Server: 1.10.0-357 • API: 2.24

eugen-ajechiloae-clearml commented 1 year ago

Hi @mctigger ! Can you please share a small script that could help us better understand what the issue is?

mctigger commented 1 year ago

Here is an updated and isolated version:

from dataclasses import dataclass

import hydra
from clearml import Task
from clearml.config import running_remotely
from hydra.core.config_store import ConfigStore
from omegaconf import OmegaConf

@dataclass
class SubConfig:
    key: str = "value"

@dataclass
class Config:
    sub: SubConfig

cs = ConfigStore.instance()
cs.store(name="config", node=Config)
cs.store(group="sub", name="subconfig", node=SubConfig)

@hydra.main(version_base=None, config_name="config")
def main(config: Config):
    config = OmegaConf.to_object(config)
    print(config)

    task: Task = Task.init(project_name="examples", task_name="clearml-hydra-test")

    if not running_remotely():
        task.execute_remotely("", clone=True, exit_process=False)

if __name__ == "__main__":
    main()

python3 scripts/test_clearml_hydra.py +sub=subconfig

leads to

force-add of config groups is not supported: '++sub=subconfig'

when executed on a clearml-agent with clearml==1.13.1. But no error on clearml==1.11.1