allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.69k stars 654 forks source link

Only grouped metrics are displayed in the WebUI #1317

Open alexx-km opened 2 months ago

alexx-km commented 2 months ago

Describe the bug

Normally, all metrics are displayed in a grouped plot and each individual metric is displayed in an individual plot. For old runs in my self-hosted setup, the individual plots are no longer displayed. The data must be available because the grouped plot seems to be correct.

Bug: image

To reproduce

I'm not quite sure how to reproduce this bug. I had to change the permissions on some files in the database because it was originally set up for the wrong user. This may have had an impact on the problem.

Expected behaviour

Metrics should be displayed in grouped but also in individual plots: image

Environment

ainoam commented 2 months ago

@alexx-km Please let us understand a little bit better what's going on:

alexx-km commented 2 months ago

Hi @ainoam, thanks for your reply. Sorry, I completely forgot to include that information. I only have this bug for "old" tasks, all recently created tasks have both grouped and individual plots. Regarding your second question, I get a grouped plot with all metrics and an individual plot for each metric, e.g. titled train/loss or train/map.d0.loss_bbox_f.

ainoam commented 2 months ago

@alexx-km How are these scalars collected? The ClearML UI is designed to either show grouped plots OR individual plots (See "Group By"). If you're seeing both, it seems like they are reported twice: Once with title="Title", series="Series" , and once with title="Title/Series", series="Series"

alexx-km commented 2 months ago

Hey @ainoam, sorry for the late reply.

We use the TensorboardLoggerHook to log all the metrics of our training. But you are right, it seems that the scalars are reported twice. When I select group by "metric", I get the grouped plot plus a plot for each individual metric. When I select group by "none", I get the plots for each individual metric. However, there still seems to be a difference between old and new experiments, as this behavior is only true for new experiments. With the old experiments, I have the described problem, that grouping by metric results in only the grouped plots being shown and not the individual plots.

Could that be due to an issue with the database and is there any way to find out what the difference between those experiments could be?