MPoL-dev / MPoL

A flexible Python platform for Regularized Maximum Likelihood imaging
https://mpol-dev.github.io/MPoL/
MIT License
33 stars 11 forks source link

Distinguishing optimal images #215

Closed jeffjennings closed 8 months ago

jeffjennings commented 9 months ago

Is your feature request related to a problem or opportunity? Please describe. Even with reasonably large dartboard cells, the loss profiles in a cross-val loop remain pretty similar (see e.g. #214). I'm not sure if many regularizer/strength combinations fitting the observed data very similarly indicates the model just needs more data to be able to distinguish between optimizations using different regularizer strengths, or even different regularizers. Maybe the predictive accuracy of the model to unobserved points scales strongly with the number of points?

Describe the solution you'd like Not sure how much of an altered modeling approach this would require. Is there a better way than CV to distinguish between models?

Describe alternatives you've considered Within the CV framework, I could test how much of a difference it makes to:

iancze commented 9 months ago

One useful comparison point would be doing just a single K-fold, i.e., a normal "train" and "validation set". Do we get meaningfully better constraints on the best hyperparameters by doing K=5, 10, ... etc? Enough has changed since we first brought in K-fold that it would be good to (re)establish baselines about how good/bad we are doing.

For example, on a single loss plot, could we chart the train and validate behavior for a few different regularizer strengths? This would be nice to show that we can easily (or not) distinguish between good and bad.

jeffjennings commented 9 months ago

I don't know if all these will render, but here's a comparison of 1, 5 or 10 k-folds for sparsity with lambda of [1e-8, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e0].

IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 1 0, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 1 0, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 1 0, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 1, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 0 1, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 1, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 01, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 0 01, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 01, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 0 001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 1e-05, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 1e-05, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 1e-05, 'guess' False}}_crossval_diagnostics IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 1e-08, 'guess' False}}_crossval_diagnostics IMLup_dartboard_5kfolds_{'sparsity' {'lambda' 1e-08, 'guess' False}}_crossval_diagnostics IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 1e-08, 'guess' False}}_crossval_diagnostics

jeffjennings commented 9 months ago

Best model of this set, identified by accuracy of visibility fit: IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_projected_visibilities

and image of residual visibilities: IMLup_dartboard_10kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_image_comparison

jeffjennings commented 9 months ago

I ran a train/test loop with the .asdf dataset of IM Lup with no averaging, and am getting a very similar result to that with the time- and frequency-averaged dataset I've been using. It doesn't seem like this is the source of the poor fit at short baselines.

Fit with .asdf dataset: IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_train_diag_kfold0_epoch01987_averaged_data

Fit with averaged dataset: IMLup_dartboard_1kfolds_{'sparsity' {'lambda' 0 0001, 'guess' False}}_train_diag_kfold0_epoch01961_asdf_data

iancze commented 9 months ago

Interesting, what do the residuals look like in terms of their sigma, for real and imaginary?

jeffjennings commented 9 months ago

I'm not sure, the NuFFT also crashes. But since the images are effectively identical, the model and residual visibilities should be too. E.g. thei images' total flux is the same to 2 parts in 1000, so they're both underfitting to the same degre at short baselines. If you want to run more tests with the full dataset, it's probably more practical to run on your cluster. I can send a script to reproduce the pipeline in its current state if needed.

iancze commented 9 months ago

Ah, right, memory is also an issue with the NuFFT. Working to predict batches of visibilities at a time is probably the way to solve it. I'll need to travel this route with the SGD work so hopefully that will illuminate the individual visibility residuals.

jeffjennings commented 9 months ago

Yeah I made an issue this morning to that end, #224. Let me know how it goes!

jeffjennings commented 8 months ago

Closing as out of date and out of scope for v0.3 redesign