Have the option to 'merge` a chosen selection of runs (e.g obtained manually or by a filter). For merged runs:
A new hash will be created for the merged row.
Numerical metrics will be transformed to mean +- standard deviation.
Metrics with >1 unique value for which aggregation is meaningless (e.g. date) will be made 'None'
Metrics with a unique value will remain constant
An option is included to 'unmerge' the runs (i.e. expand and restore to previous state).
Motivation
In most AI/ML research papers, result metrics are reported as the mean and standard deviation over repeated runs with identical configuration to account for randomness in training.
Currently, I either have to:
1) Incorporate 'repetitions' into my experiment code -> this is clumsy and frustrating. It seems sensible for my code/model to focus on a single pipeline for training and evaluation -> my experiment tracker should be responsible for aggregating the results from those experiments (almost by definition).
2) Export the experiment runs into a .csv, and manually aggregate to compute means +- stds.
Admittedly, neither of these take an extensively long time - but they are frustrating and feel like they could relatively easily be incorporated into the GUI for aim leading to a nice QOL improvement.
For example, in the above, I'd like to collapse the selected runs and the mae_benchmark, mse_benchmark metrics to be aggregated with the mean and s.d. reported.
π Feature
Have the option to 'merge` a chosen selection of runs (e.g obtained manually or by a filter). For merged runs:
Motivation
In most AI/ML research papers, result metrics are reported as the mean and standard deviation over repeated runs with identical configuration to account for randomness in training.
Currently, I either have to:
1) Incorporate 'repetitions' into my experiment code -> this is clumsy and frustrating. It seems sensible for my code/model to focus on a single pipeline for training and evaluation -> my experiment tracker should be responsible for aggregating the results from those experiments (almost by definition). 2) Export the experiment runs into a .csv, and manually aggregate to compute means +- stds.
Admittedly, neither of these take an extensively long time - but they are frustrating and feel like they could relatively easily be incorporated into the GUI for aim leading to a nice QOL improvement.
For example, in the above, I'd like to collapse the selected runs and the
mae_benchmark
,mse_benchmark
metrics to be aggregated with the mean and s.d. reported.