Run merging and aggregation

🚀 Feature

Have the option to 'merge` a chosen selection of runs (e.g obtained manually or by a filter). For merged runs:

A new hash will be created for the merged row.
Numerical metrics will be transformed to mean +- standard deviation.
Metrics with >1 unique value for which aggregation is meaningless (e.g. date) will be made 'None'
Metrics with a unique value will remain constant
An option is included to 'unmerge' the runs (i.e. expand and restore to previous state).

Motivation

In most AI/ML research papers, result metrics are reported as the mean and standard deviation over repeated runs with identical configuration to account for randomness in training.

Currently, I either have to:

1) Incorporate 'repetitions' into my experiment code -> this is clumsy and frustrating. It seems sensible for my code/model to focus on a single pipeline for training and evaluation -> my experiment tracker should be responsible for aggregating the results from those experiments (almost by definition). 2) Export the experiment runs into a .csv, and manually aggregate to compute means +- stds.

Admittedly, neither of these take an extensively long time - but they are frustrating and feel like they could relatively easily be incorporated into the GUI for aim leading to a nice QOL improvement.

Screenshot from 2023-11-09 13-57-47

For example, in the above, I'd like to collapse the selected runs and the mae_benchmark, mse_benchmark metrics to be aggregated with the mean and s.d. reported.

aimhubio / aim

Run merging and aggregation #3058

🚀 Feature

Motivation