Closed jeffjennings closed 8 months ago
One useful comparison point would be doing just a single K-fold, i.e., a normal "train" and "validation set". Do we get meaningfully better constraints on the best hyperparameters by doing K=5, 10, ... etc? Enough has changed since we first brought in K-fold that it would be good to (re)establish baselines about how good/bad we are doing.
For example, on a single loss plot, could we chart the train and validate behavior for a few different regularizer strengths? This would be nice to show that we can easily (or not) distinguish between good and bad.
I don't know if all these will render, but here's a comparison of 1, 5 or 10 k-folds for sparsity with lambda of [1e-8, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e0].
Best model of this set, identified by accuracy of visibility fit:
and image of residual visibilities:
I ran a train/test loop with the .asdf dataset of IM Lup with no averaging, and am getting a very similar result to that with the time- and frequency-averaged dataset I've been using. It doesn't seem like this is the source of the poor fit at short baselines.
Fit with .asdf dataset:
Fit with averaged dataset:
Interesting, what do the residuals look like in terms of their sigma, for real and imaginary?
I'm not sure, the NuFFT also crashes. But since the images are effectively identical, the model and residual visibilities should be too. E.g. thei images' total flux is the same to 2 parts in 1000, so they're both underfitting to the same degre at short baselines. If you want to run more tests with the full dataset, it's probably more practical to run on your cluster. I can send a script to reproduce the pipeline in its current state if needed.
Ah, right, memory is also an issue with the NuFFT. Working to predict batches of visibilities at a time is probably the way to solve it. I'll need to travel this route with the SGD work so hopefully that will illuminate the individual visibility residuals.
Yeah I made an issue this morning to that end, #224. Let me know how it goes!
Closing as out of date and out of scope for v0.3 redesign
Is your feature request related to a problem or opportunity? Please describe. Even with reasonably large dartboard cells, the loss profiles in a cross-val loop remain pretty similar (see e.g. #214). I'm not sure if many regularizer/strength combinations fitting the observed data very similarly indicates the model just needs more data to be able to distinguish between optimizations using different regularizer strengths, or even different regularizers. Maybe the predictive accuracy of the model to unobserved points scales strongly with the number of points?
Describe the solution you'd like Not sure how much of an altered modeling approach this would require. Is there a better way than CV to distinguish between models?
Describe alternatives you've considered Within the CV framework, I could test how much of a difference it makes to: