Closed fjebaker closed 1 year ago
@phajy what do you think?
This sounds like a very sensible approach. The different datasets that I can think of immediately include multi-wavelength spectra (e.g., radio, optical flux densities), binned X-ray spectra, time series, power spectra, time lags versus frequency, and time lags versus energy. I believe these all fit naturally into this framework with the flexibility for other datasets we haven't thought of yet.
Oh, another area we might want to consider for the (more distant) future is fitting datasets with spatial information, e.g., images, or spectral cubes (a spectrum in each pixel).
I've been thinking a bit more about this from the practical standpoint of trying to fit a model to multiple datasets. If this is an XSPEC model it is computed as integrated counts in discrete energy bins. We might want to fit this to multiple spectral datasets, some of which will be X-ray datasets already in the expected format, but others could be, e.g, radio or optical flux densities that are not naturally in these units, or integrated over bin widths. Options might be to 1) import these as a BinnedDataset
, or 2) import these as a Dataset
and evaluate the model as a Dataset
with the appropriate unit conversions. Perhaps we could discuss this. But overall I think the changes originally proposed make sense.
P.S. Also think the distinction between AbstractData
and AbstractDataset
make sense.
P.P.S. This might also help fitting when simultaneously applying the same model to Dataset
and BinnedDataset
.
These are good points, but I think they still fit in the proposed changes. The Dataset
(maybe better to call it something that reenforces the fact that it is essentially just a bijective mapping between two arrays -- given some x, what is the y, etc) and BinnedDataset
are used and interacted with by the user, so that irrespective of what the underlying data is, the API is homogenous.
The X-ray / optical / radio data is instead read in as a Spectrum
or a BinnedSpectrum
or some other structure, which provides various translation so that they can fit inside a BinnedDataset
or Dataset
, so that models can always interact with a format of data they need.
Essentially spectra store the raw data as it is with some minimal accessor methods, and datasets give them "meaning" through their richer API, and optional combination with e.g. responses, and "know" if the data needs to be integrated or binned or whatever to work with a given model.
You can then fit multiple AbstractDatasets
, each with completely different underlying data, but the models receive exactly what they expect thanks to the translation that the dataset API provides.
Just a note about an unusual use case. How flexible do we want to be about the bins in the datasets? E.g., let's say we have a dataset that has two (different) radio flux densities at the same frequency. Do we want to force the user to create two separate datasets, or could SpectralFitting handle this seeming inconsistency without any problems. The data points might also not be sorted or contiguous. It is computationally equivalent to two separate datasets but might be easier for the user to have one dataset. Not an important issue, but we can discuss / think about.
The difference between
SpectralDataset
andSimpleDataset
is entirely ambiguous given only their names. I think it would be better if the nomenclature was reflective of what these structures actually are, and propose to rename:SpectralDataset
->BinnedDataset
SimpleDataset
->Dataset
The relation to spectra is then also lifted, and we can instead use the
Spectrum
type to indicate what these containers are holding on to (c.f. timeseries).Dataset
andBinnedDataset
could similarly be expanded to have many of the same fields, such that there would be a non-invertable relation fromBinnedDataset
toDataset
which involves taking the midpoint of each bin.Since the
Spectrum
is only storing channels and values, there is already a 1-to-1 correspondence, which is augmented by its container (i.e.BinnedDataset
). The masking API would then similarly be defined for allAbstractDatasets
.Summary
AbstractData
should be the struct which stores data only, whereas theAbstractDataset
containers interpret the data in different ways. A container enriches the data with e.g. responses, ARFs, backgrounds, and provides the API the user iterracts with.AbstractData
is instead used only internally to help rationalize implementing newAbstractDatasets
and abstracts how the data itself is stored.SpectralDataset
andSimpleDataset
.Dataset
andBinnedDataset
to be made more homogeneous.Spectrum
to beAbstractData
.BinnedDataset
andDataset
to be views on the underlying data, distinguishing the many-to-1 and 1-to-1 relation.I think the dataset API should be planned here before these changes are made. Currently, the API is along the lines of
I propose to also add