MPoL-dev / MPoL

A flexible Python platform for Regularized Maximum Likelihood imaging
https://mpol-dev.github.io/MPoL/
MIT License
34 stars 11 forks source link

Redesign UVDataset with Pytorch idioms in mind #162

Closed iancze closed 6 months ago

iancze commented 1 year ago

In the first part of the general effort to redesign the visibility datasets (#126), we should redesign/update the UVDataset class. Currently, this class is used nowhere in the codebase, so it shouldn't be much difficulty to experiment with new ideas (we could also delete this object, if we decide that's the right course).

The idea is that UVDataset (or some renamed version of it) will be for interacting with what we are calling the "loose" visibilities. I.e., the ungridded visibilities obtained raw from some measurement set. For example, a typical ALMA measurement set might contain 300,000+ individual visibility measurements. Because dealing with so many visibility points is computationally expensive, most users will want to interact with a GriddedDataset (whose redesign is discussed in #163). A GriddedDataset requires some special indexing to match up with the FourierCube output, so redesigning the UVDataset is probably the more straightforward of the issues, even if it's not the first object most people will use. And, once we figure out some of the larger redesign issues, we should have a better idea of how to redesign GriddedDataset.

There there are several instances where the user would want to interact with the loose, ungridded visibilities where a UVDateset would be helpful. This is now possible thanks to the NuFFT (#78) in the codebase.

The goal of this issue, generally, is to align our dataset objects with as many of the Pytorch idioms as possible, described here, here, and here.

The idea is that the user will instantiate the dataset with numpy arrays for u, v, weight, and data (much the same way DataAverager is instantiated).

Things we should think about:

We may also want a UVDataset to contain a routine that converts it to a GriddedDataset (by passing through the gridding.DataAverager).

iancze commented 10 months ago

We've made considerable design progress on this through the adoption of the SGD paradigm. While addressing this issue, we should also add