allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.69k stars 655 forks source link

Experiment name on clearml task with pytorch lightning CLI #1164

Closed CourchesneA closed 10 months ago

CourchesneA commented 11 months ago

I am using PyTorch Lightning as shown in the documentation:

def main():
    experiment_name = "my_experiment"
    clearml_task = Task.init(
        project_name=experiment_name, task_name="my_task"
    ) 

    cli = MyLightningCLI(
        MyModel,
        MyDataModule,
    )

This is working fine, but I am trying to not having to hardcode the experiment name. I would like to be able to specify that information either as a command line argument or in the config file. The issue that I am having is that PyTorch Lightning is handling the parsing of the cmdline args and the config file, and it's happening when MyLightningCLI is instantiated. If I add my own argparse it will interfere with the jsonargparse that PyTorch lightning is already setting up.

I have tried defining the task later (i.e. either after the cli is instantiated, or in the cli __init__ method) but it seems like the task is not registered properly and is missing important information (i.e. hyperparams) if Task.init is not called before the args and configs are parsed by PL. I tried manually registering those hyperparameters afterward using task.connect(cli.trainer.hparams) but it didn't work, and I am not sure it is the correct solution.

How could I read the experiment name dynamically in this case ?

eugen-ajechiloae-clearml commented 11 months ago

Hi @CourchesneA ! Looks like this happens because we don't bind jsonargparse at import clearml time. We will fix this ASAP. In the meantime, you could change the task's project if it suits your usecase: https://clear.ml/docs/latest/docs/references/sdk/task#set_project

CourchesneA commented 11 months ago

Thanks for the solution ! I am just now testing this and it seems like it is not solving the issue:

    clearml_task = Task.init(
        project_name="placeholder_project", task_name="mytask"
    ) 
    cli = MyCLI(MyModel, MyDataModule)
    clearml_task.set_project(project_name="MyProject")

In the dashboard that experiment is still showing as "placeholder_project

CourchesneA commented 11 months ago

Hi @eugen-ajechiloae-clearml, I am trying to reproduce experiments remotely, but it looks like any configuration that I change in the UI for the cloned tasks, is not picked up by Pytorch (i.g. I change the number of epochs from 10 to 50 and the ClearML Task correctly shows 50, but the task still only run for 10 epochs). Do you think this is also related to jsonargparse bindings ?

CourchesneA commented 10 months ago

Is there any update or ETA on this fix ?

ainoam commented 10 months ago

@CourchesneA This should be fixed in the next release which should become available in the next few days.

CourchesneA commented 10 months ago

That is awesome news, thanks a lot !

pollfly commented 10 months ago

Hey @CourchesneA! Just letting you know that this issue has been resolved in the recently released v1.14.0.

CourchesneA commented 10 months ago

Thanks ! I was able to get it to work now by setting the task name and project after CLI instantiation:

    clearml_task = Task.init(
        project_name="placeholder_project", task_name="mytask"
    ) 
    cli = MyCLI(MyModel, MyDataModule)
    clearml_task.set_project(project_name="MyProject")
    clearml_task.set_name("MyTask")

https://github.com/allegroai/clearml/issues/1164#issuecomment-1852717344 That related issue is now also fixed