Redesign UVDataset with Pytorch idioms in mind

In the first part of the general effort to redesign the visibility datasets (#126), we should redesign/update the UVDataset class. Currently, this class is used nowhere in the codebase, so it shouldn't be much difficulty to experiment with new ideas (we could also delete this object, if we decide that's the right course).

The idea is that UVDataset (or some renamed version of it) will be for interacting with what we are calling the "loose" visibilities. I.e., the ungridded visibilities obtained raw from some measurement set. For example, a typical ALMA measurement set might contain 300,000+ individual visibility measurements. Because dealing with so many visibility points is computationally expensive, most users will want to interact with a GriddedDataset (whose redesign is discussed in #163). A GriddedDataset requires some special indexing to match up with the FourierCube output, so redesigning the UVDataset is probably the more straightforward of the issues, even if it's not the first object most people will use. And, once we figure out some of the larger redesign issues, we should have a better idea of how to redesign GriddedDataset.

There there are several instances where the user would want to interact with the loose, ungridded visibilities where a UVDateset would be helpful. This is now possible thanks to the NuFFT (#78) in the codebase.

The goal of this issue, generally, is to align our dataset objects with as many of the Pytorch idioms as possible, described here, here, and here.

The idea is that the user will instantiate the dataset with numpy arrays for u, v, weight, and data (much the same way DataAverager is instantiated).

Things we should think about:

Is there any pre-processing that should be done on the arrays? Error checking?
How does the device location interact with creation/moving/slicing?
Should this be an IterableDataset or any of the other types of datasets provided by Pytorch?
How should the dataset use batch dimensions to parallelize when we have multiple channels to the data, as in a spectral cube?
How should the dataset use sub-batches, in the limit that we have many, many visibilities and we'd like to use something like stochastic gradient descent?
other considerations?

We may also want a UVDataset to contain a routine that converts it to a GriddedDataset (by passing through the gridding.DataAverager).

MPoL-dev / MPoL

Redesign UVDataset with Pytorch idioms in mind #162