Open MightyGoldenJA opened 1 year ago
Thanks for suggesting @MightyGoldenJA,
As you note, ClearML should pick up on the underlying logging e.g. any snapshots your training saves or metrics logged to Tensorboard.
How did you log your scalars when training the Darts TFT model?
@ainoam I did mot manage to pick up scalars I just adjusted the log frequency to be able to follow progress on the console.
Not sure I follow @MightyGoldenJA - Are we speaking only on console log here? Do you mean that unless you modify the logging frequency, console outputs appear but are not captured by ClearML?
@ainoam I was meaning I managed to get console log but I did not managed to capture scalars metrics like loss
, val_loss
, etc...
@MightyGoldenJA How did you log your scalars? Report to Tensorboard?
@MightyGoldenJA where did you report your metrics to? Tensorboard? local file?
@ainoam As described in the linked Slack thread, I used a PyTorch Lightning trainer using the default PL logger, since my trainings using PyTorch Lightning on other projects always got their scalars and metrics properly captured by ClearML I found surprising that it wasn't the case for Darts even if the TFT model is PyTorch-based and is trained using a PL Trainer.
Hey @MightyGoldenJA Can you please let us know what pytorch-lightning and PyTorch version you have installed for the failing example?
Hey @AlexandruBurlacu the tested versions are torch==2.0.1
and pytorch-lightning==2.0.2
Hey @MightyGoldenJA, we had some issues with pytorch-lightning>=2.0.0
, but we fixed them in clearml==1.11.1rc2
. Can you please install it and see whether it fixes your problem?
@AlexandruBurlacu With clearml==1.11.1rc2
not only the scalar are not captured and the PL trainer logs are no longer captured either (we had to rollback to get back the functional log capture).
I can't pass the ClearML logger in the logger
param of my PL trainer without triggering a concurrency exception I guess I will have to do my own PR on Darts or ClearML side if I want this to work before the end of the year....
Kay by manually defining a custom PL logger and passing it to the trainer I managed to log scalars, but this is not normal behavior ClearML is supposed to auto-connect to PyTorch, hence I let you (@AlexandruBurlacu) close this issue if you do not think this is a problem.
class ClearMLLogger(Logger):
@property
def name(self):
return 'ClearMLLogger'
@property
def version(self):
return '0.0.1'
@rank_zero_only
def log_hyperparams(self, params):
task = clearml.Task.current_task()
task.connect(params, name='Hyperparameters')
@rank_zero_only
def log_metrics(self, metrics, step):
task = clearml.Task.current_task()
for name, metric in metrics.items():
task.get_logger().report_scalar(title=name, series=name, value=metric, iteration=step)
Proposal Summary
Add specific Darts Time-Series Forecasting Library integration.
Motivation
For some reason even if Darts is built on top of libraries supported by ClearML, the monitor fails to correctly capture the scalars (at least on Temporal Fusion Transformers).
Related Discussion
https://clearml.slack.com/archives/CTK20V944/p1680610946630759