Open levan92 opened 2 years ago
Hi @levan92 Is this feature request similar to this one #473 ? Is this an extension of a new way to retrieve data from the system.
BTW: When you get a Task entity (see Task.query_tasks), you can retrieve the Task's metrics with get_reported_scalars and get_last_scalar_metrics.
Thanks for your reply @bmartinn!
query_tasks
still requires additional code to process the results, I'm actually looking at something more intuitive/user-friendly.Some ways I can imagine would really help:
N
times, such that we submit once but N
repeated experiments appears on the web UI (at different seeds) When submitting remote execution jobs, we can indicate a parameter to repeat the same task N times, such that we submit once but N repeated experiments appears on the web UI (at different seeds) "Grouping" on the web UI so that the repeated experiments does not clutter and also allow us to see average on result metrics (validation accuracy for example), and average statistics can be mean/median/mode.
Let me see if I understand, so this is basically grouping?! e.g. collapse all experiments into a single line (criteria unknown), then when we need the details expand them, with the actual values presented in the table (i.e. scalars) as the average over all the lines in the collapsed experiments ?
assuming I understand you correctly, the first challenge is defining which experiments are grouped together. My feeling is that any automagic rule will end up breaking, and users ill want full control over what goes where. This leads to the idea of grouping based on joint "tag/s", now if we are already using tags, why don't we just use the already existing "sub-folder" features and create a folder for each group, wdyt ?
Regrading the scalars summary (i.e. averaging the metric values), this is a great idea to add to the project overview, no?
(basically what I'm trying to say is that nesting (a.k.a collapse / expand) inside tables is always very tricky to get working correctly in terms of UI/UX, where as sub/folders are a more straight forward solution)
When submitting remote execution jobs, we can indicate a parameter to repeat the same task N times, such that we submit once but N repeated experiments appears on the web UI (at different seeds) "Grouping" on the web UI so that the repeated experiments does not clutter and also allow us to see average on result metrics (validation accuracy for example), and average statistics can be mean/median/mode.
Let me see if I understand, so this is basically grouping?! e.g. collapse all experiments into a single line (criteria unknown), then when we need the details expand them, with the actual values presented in the table (i.e. scalars) as the average over all the lines in the collapsed experiments ?
assuming I understand you correctly, the first challenge is defining which experiments are grouped together. My feeling is that any automagic rule will end up breaking, and users ill want full control over what goes where. This leads to the idea of grouping based on joint "tag/s", now if we are already using tags, why don't we just use the already existing "sub-folder" features and create a folder for each group, wdyt ?
Regrading the scalars summary (i.e. averaging the metric values), this is a great idea to add to the project overview, no?
(basically what I'm trying to say is that nesting (a.k.a collapse / expand) inside tables is always very tricky to get working correctly in terms of UI/UX, where as sub/folders are a more straight forward solution)
Not sure if this is the OP intent, but grouping experiments into collapsible rows (without combining metrics or any of their data, just a UI tweak!) is quite common. I think this can probably be achieved in ClearML too - just group by the parent_task
?
but grouping experiments into collapsible rows (without combining metrics or any of their data, just a UI tweak!) is quite common. I think this can probably be achieved in ClearML too - just group by the parent_task?
Hmm, for it to work properly, we need a good strategy on "parent task":
wdyt?
That's a good question. I think those make for sensible defaults (maybe let the user change the "default parent task" in WebUI?). Then what about cloning a parent task? Is it allowed? Does it clone all child tasks?
Hey @idantene
maybe let the user change the "default parent task" in WebUI?
What do you mean by "default parent task" ?
Then what about cloning a parent task? Is it allowed? Does it clone all child tasks?
Allowed, and by design it will not clone the child Tasks. Is there a reason to do that?
What do you mean by "default parent task" ? In reference to (2):
Cloning a Task with a parent -> Newly created Task parent is the "original" Task's parent (i.e. sibling task)?
, a user may want to change the "original" parent to something else.Allowed, and by design it will not clone the child Tasks. Is there a reason to do that? It could -- it really depends on how the notion of "parent" task is used. As it is right now, it can either be defined as:
- A parent task is task A, such that task B is a clone of Task A with some changes (allows you to go back through changes)
- A parent is any task that is used to group other tasks in it, and is not necessarily an original or cloned task.
Thanks @idantene , I think I now better understand the use case.
showing just 1 set of averaged statistics instead?
This is reflective of @levan92 original request.
Basically I can think of two paths we can take:
wdyt?
I'd also love this feature. It would be great if aggregation was available in the web UI. Ideally it'd be possible to show the mean value while also indicating some other statistic (standard deviation, confidence interval) as an area.
AIM does a good job at this, see this clip as an example. You can first select which criterion to group by (experiment name, some hyperparameter) and then aggregate the scalars based on that criterion.
Hi @timokau,
Thanks for pointing this out and sharing the clip! This is indeed on our radar and we're evaluating different approaches on how to implement this. I'll come back here once I have a more concrete solution, or if we need more feedback on our thoughts!
Also would like to have this feature implemented. Averaging over seeds is a very basic feature for ML research.
I also want to express my wish to see this feature implemented. It is crucial for workflow of lots of ML researchers
@mrodiduger Seeing as the discussion so far has diverged in multiple directions :), which of those are you considering when endorsing "this" feature?
This feature would be highly beneficial for projects with 100 or even 1000+ experiments. Grouping by hyperparameters allows to quickly see the statistical effects of those parameters on the loss or evaluation scalars.
I would also express a wish for this feature to be implemented.
@Capsar @jledragon Thanks for bumping this feature request.
Do note that the UI lets you download the experiment table along with any desired custom column in CSV format. Can this help in the meantime?
I would also have this feature implemented.
@ainoam As far as I understand, we want the feature to be natively supported from the webui (WandB has this feature by the way). Manually download the results and write another script to group and plot is always a solution, but it is not convenient.
Thanks for joining the conversation @IcarusWizard. The manual processing option is merely there for visibility, to allow progress until additional capabilities are introduced 🙂
It has been a few years since this issue was opened, but I would also like to request this feature be added. This is standard practice in ML research, and is disappointing that such a critical feature is not natively supported by ClearML, as I spent most of my day looking to make this happen on the web UI only to arrive here.
Most ML research often reports plots with confidence intervals over several runs.
I appreciate all the work everyone has put into ClearML, but from this perspective, ClearML may not be in line with ML researchers' expectations.
Hi, often in ML experiments, we will run several (3-5) runs of the same hyperparameter to cover the randomness in the training, and subsequently report findings based on averaged outcomes (whether is it a mean or median).
May I know if there are any current features of ClearML that allow for grouping of such similar runs & showing just 1 set of averaged statistics instead? This will be useful in comparing against other groups of experiment runs (aka, a group of 3 runs vs another group of 3 runs).