MPoL-dev / MPoL

A flexible Python platform for Regularized Maximum Likelihood imaging
https://mpol-dev.github.io/MPoL/
MIT License
34 stars 11 forks source link

Cross validation workflow #99

Closed iancze closed 1 year ago

iancze commented 2 years ago

Cross validation is useful for determining optimal (or at least good enough) parameter settings for regularization.

Currently, though, most of the functionality for doing this exists outside of the MPoL package itself. This is partially by design and mirrors the way some PyTorch projects are set up with respect to functionality / optimizers. However, the current K-fold CV workflow is somewhat clunky and there are likely areas of improvement.

Describe the solution you'd like

iancze commented 2 years ago

Possibly useful for visualization (in addition to tensorboard): https://napari.org/stable/index.html

iancze commented 1 year ago

On a related but possibly separate note, @jeffjennings also mentioned that it might be interested to ensure that cross-validation blocks should always roughly have the same 1D weighted baseline distribution.

jeffjennings commented 1 year ago

I think one aspect of the current cross-val workflow that could be improved is the train/test set division in KFoldCrossValidatorGridded, moving from standard k-fold to stratified k-fold. It would address that:

A stratified k-fold approach would ensure the training sets have almost exactly the same number of points, including the same number in each of several baseline bins. This also ensures the train:test set size ratio is constant and ~exactly a chosen value.

briannazawadzki commented 1 year ago

We should implement an easy way to use uniform partitioning for CV, similar to how we implemented Dartboard.

briannazawadzki commented 1 year ago

See below for the forced (not generalized at all) implementation we used for testing in 2021

Messy random cell cross validation

briannazawadzki commented 1 year ago

KFoldCrossValidatorGridded will need to be generalized or changed, as right now it requires Dartboard and does not allow for other options. We could either rename this to communicate that it's dartboard specific, or we could make a generalized KFoldCrossValidatorGridded which can handle multiple types of partitioning.

iancze commented 1 year ago

Closing this issue for now, since the main action items (renaming and RandomCell gridding) were implemented by #132 . There are still larger discussions to be had about cross validation strategies (e.g., #93 ) and accuracy (most importantly), but once we progress those discussions a bit further we can open targeted issues for the codebase.