How / whether to support synthetic spectral models?

We often want to compare data and models. So far muler only knows about real data observed with a telescope. But in principle there is nothing stopping us from making classes parallel to HPFSpectrum and IGRINSSpectrum called, say, PHOENIXSpectrum, or SonoraSpectrum.

These synthetic spectral models would have no uncertainty associated with them, and they would have additional metadata about their physical labels. But the units and math operations would work out-of-the-box since a SonoraSpectrum would have units and common methods just like any other Spectrum1D instance. Some methods like .estimate_snr() or .deblaze() wouldn't work or make sense, but others like .measure_equivalent_width() would work and make sense.

The main architectural decision to make is:

Whether we want to make an entirely separate package to deal with Synthetic spectra, call it say buler.
Whether we want to keep Synthetic spectra and Data spectra under the same muler roof.
Whether we want to make Synthetic spectra a part of the blase package.

The main reason to make an entirely new package is to be more general than just echelle spectra. muler doesn't currently and may never support low resolution spectra for example. We already have a parallel project blase intended to compress/emulate the synthetic spectral models, so maybe that's a natural forum for dealing with synthetic spectra.

I'm on the fence here! The solution may be one of expedience. If we can develop blase fast enough, it can gain these new features. But if not, muler will prevail. blase also has the impediment of being written in PyTorch, which serves as a slight barrier to entry for installation and contributions.

Additional clarifying information:

The HPF Goldilocks pipeline and IGRINS plp pipelines essentially set a defacto standardization of all HPF and IGRINS spectra that muler relies upon and takes advantage of: muler packages up all that metadata and knows how to read every single spectrum that comes from those facility pipelines. All that magic happens in the __init__() method when a new HPFSpectrum or IGRINSSpectrum class is initialized from a filename.

In the same way, PHOENIX and Sonora-Bobcat have a defacto standard that we would rely-upon in the __init__() method, and so the actual amount of work to implement one of these classes is relatively small. So we should probably just make a quick demo as a proof-of-concept.

The real work would come from maintaining the synthetic spectral models, since there are many different flavors and versions, and everyone has a preferred flavor. The support for methods could be tricky too: each "correction" operation that can be performed on data and be thought of as an equal and opposite "doctoring of the model" operation of the Synthetic spectrum. Do we support both operations? For example .deblaze could be run on data to make the spectrum flat, or warp_to_blaze could be conducted on the model to make it look curvy. Implementing the mirror-image procedure in two places sounds like a time sink, violates the "Don't Repeat Yourself" mantra of programming, and ultimately makes it hard to maintain code in two places.

That's why I'm putting as much thought into this decision as this Issue suggests. Deciding to support Synthetic spectra has many benefits, essentially making muler a full blown analysis framework. But it has the disadvantage of making the codebase bigger and harder to maintain.

For those reasons I'm slightly inclined to make blase the place for all Synthetic spectral models, and muler just the relatively lightweight wrapper interface to the data and common routines.

Basically: blase is for forward modeling muler is for data processing

They can meet somewhere in the middle and interoperate, with the sweet spot being that muler stops when it gets to flattened, cleaned, barrycentric corrected, spectra with robust propagated uncertainty estimates, and blase starts when it applies forward models to that stage of the processed data.

Update: After discussing with @astrocaroline we decided the right strategy is the underdog: make a third package to represent synthetic spectra, rather than try to shoe-horn it into blase or muler. The key insight in our discussion was the concept of microservices. This third package---with an as-yet-undecided name---would represent synthetic spectra as outlined above. Like muler it would be a subclass of Spectrum1D from specutils, with its own custom methods. Caroline asked if there isn't something already like this (so we don't re-invent the wheel). The closest thing I can think of is PySynphot or Starfish. I suppose we can again think of PySynphot as yet another microservice, with this unnamed package as capable of plugging into PySynphot in the future. Starfish has methods for interacting with pre-computed models. But maybe this new package serves as a future microservice for Starfish. It still may be worth digging a bit to see what else is out there, since I think other places have had to deal with synthetic spectral models.

You might be wondering if we hit a limit of too granular a package. I don't think so. You can imagine already a few methods, such as changing the resolution, resampling, convolving with a rotational broadening kernel, labeling lines (depends on the library and available lines). You can imagine replicating the intuition dashboard as an interact() method. By the time all of those are implemented, the package is already substantial, and there are always more ideas that come later.

This new package, muler, and blase can all interoperate: you can have a muler-based cross correlate method that accepts a Synthetic spectral template. You can hand a Synthetic spectral template to blase for it to clone.

Hooray! This issue is closed by the creation of https://github.com/BrownDwarf/gollum :partying_face:

OttoStruve / muler

How / whether to support synthetic spectral models? #16