Closed iancze closed 1 year ago
Possibly useful for visualization (in addition to tensorboard): https://napari.org/stable/index.html
On a related but possibly separate note, @jeffjennings also mentioned that it might be interested to ensure that cross-validation blocks should always roughly have the same 1D weighted baseline distribution.
I think one aspect of the current cross-val workflow that could be improved is the train/test set division in KFoldCrossValidatorGridded
, moving from standard k-fold to stratified k-fold. It would address that:
A stratified k-fold approach would ensure the training sets have almost exactly the same number of points, including the same number in each of several baseline bins. This also ensures the train:test set size ratio is constant and ~exactly a chosen value.
We should implement an easy way to use uniform partitioning for CV, similar to how we implemented Dartboard.
See below for the forced (not generalized at all) implementation we used for testing in 2021
KFoldCrossValidatorGridded will need to be generalized or changed, as right now it requires Dartboard and does not allow for other options. We could either rename this to communicate that it's dartboard specific, or we could make a generalized KFoldCrossValidatorGridded which can handle multiple types of partitioning.
Closing this issue for now, since the main action items (renaming and RandomCell gridding) were implemented by #132 . There are still larger discussions to be had about cross validation strategies (e.g., #93 ) and accuracy (most importantly), but once we progress those discussions a bit further we can open targeted issues for the codebase.
Cross validation is useful for determining optimal (or at least good enough) parameter settings for regularization.
Currently, though, most of the functionality for doing this exists outside of the MPoL package itself. This is partially by design and mirrors the way some PyTorch projects are set up with respect to functionality / optimizers. However, the current K-fold CV workflow is somewhat clunky and there are likely areas of improvement.
Describe the solution you'd like