New interface plans - Githubissues

drvdputt commented 2 years ago

As soon as the YAML-based science packs are in working order (#222 ), I will start working on the new interface. I am posting the current draft here for reference, with some comments as to what should be going on under the hood.

A couple of use cases providing constraints

A Model should be created from a YAML science pack OR from a previous fitting result (saved model).
It should be possible to create a model without providing observational data. Use case: "what does this physics look like at redshift z with instrument A and B".
A model should be reusable (fit to spectrum X then to spectrum Y), and an initial guess based on a previous outcome should also be possible (fit to spectrum Y after spectrum X, but keep the instance that was used for spectrum X).

import pahfit 
from specutils import Spectrum1D

# We want the input spectra to work via Spectrum1D.
# Information on what instrument pack to use could be contained in the metadata of this object.
mymiri = Spectrum1D.read(...)
mymiri.meta['instrument'] = 'jwst.miri.ch1.A'
specs = [mymiri, Spectrum1D.read(...)] # fake

# Create a Model instance. We want the creation of model objects to work without providing any observational data. What describes the transition from feature list to a model actually applicable to data is the instrument name and the redshift.
mymod = pahfit.model.Model('classic.yml', instrumentname, redshift = 0.04)

# Initial guess function edits the features table. Will probably have some options.
mymod.guess(fancy_guess=True, ...)

# plot function converts the features table into a functional form (astropy model), and plots it. The FWHM based on the instrument pack is applied directly to the underlying astropy model, not affecting the contents of the features table. Spectral data are optional. 
mymod.plot(optional_spectrum=specs) # just-in-time FWHM recreation.

# Do some astropy.table modifications. We want to be able to access the features table through python, instead of having to edit the YAML file every time. Changes to the feature table should be reflected whenever the next guess or plot or fit is done.
mymod.features['wavelength'][4] = 12.22
mymod.redshift = 0.05 # update the redshift

# the fit function creates and astropy model "just in time", applies the FWHM based on the instrument model, does the fit, and then writes the results into the features table. The line widths are not stored in the features table, as they are determined by the instrument model (although we might deviate from this idea to support high velocity dispersion)
mymod.fit(specs)

# inspect results such as chi-squared, number of function evaluations 
mymod.info()

# objects should be copyable so we can do this
parent = mymod.copy() # make a copy to keep around
mymod.fit(other_specs) # fit using the old result as a starting point

# Save model (only) to disk. Do we want uncertainties etc in metadata?
mymod.save('~/mycoolmod.ecsv')

# Read a model in
another = pahfit.read('~/mycoolmod.ecsv')

# use saved result to continue fitting more spectra
another.guess(other_specs) # Fine

jdtsmith commented 1 year ago

mymiri.meta['instrument'] = 'jwst.miri.ch1.A'

Hoping this step will be provided by JWST pipeline itself at some point.

mymod.guess(fancy_guess=True, ...)

You will have to pass the spectrum/iterable of spectra to guess. I'd also just use an optional positional argument for plot.

The line widths are not stored in the features table

The only exception being lines which have bounds specified for FWHM, either by the user directly on the Feature table itself, or indirectly by using multi-instrument specifiers.

jdtsmith commented 1 year ago

This looks like a great overall setup.

drvdputt commented 1 year ago

I have put something together for the new interface, close to working. For now, param_info is generated as an intermediate step wherever needed. For example, I have the function

 def _construct_astropy_model(self):
        """Convert the features table into a fittable model."""
        param_info = self._kludge_param_info()
        return PAHFITBase.model_from_param_info(param_info)

which uses the old code like this

 def _kludge_param_info(self):
        param_info = PAHFITBase.parse_table(self.features)
        # edit line widths and drop lines out of range
        param_info[2] = PAHFITBase.update_dictionary(
            param_info[2], self.instrumentname, update_fwhms=True
        )
        param_info[3] = PAHFITBase.update_dictionary(
            param_info[3], self.instrumentname, update_fwhms=True
        )
        return param_info

The main thing I still have to work on (as far as functionality goes) is a function I've called Model._parse_astropy_result(self, astropy_model):, which will update the features table based on a fit result. This should be relatively straightforward, as the names of the astropy model components and those in the name column of the features table are the same by construction.

There is one exception though, so here's a question:

Why is the attenuation named "S07_att" in the astropy model, and "silicate" in the features table? Can I change the code so that it's just "silicate" everywhere? This renaming happens like this by the way

 if att_info["names"] == 'silicate':
            name_array = np.array(['S07_att'])
            att_info.update({'names': name_array})

jdtsmith commented 1 year ago

This is great, thanks Dries. Hopefully the user will never have to "handle" a param_info dict, but instead the API will generate it and pass it along where it's needed under cover.

For attenuation, I'd strongly suggest keying on kind=attenuation. And in general (where possible) prefer kind over the names, since names are an arbitrary convenience, and up to the user, who may in fact develop their own custom science pack. As we were discussing on Monday, we will likely branch out beyond the original hand-crafted PAHFIT attenuation curve (=S07), to provide access to others. So any reference to that specific curve isn't meaningful.

drvdputt commented 1 year ago

Hopefully the user will never have to "handle" a param_info dict, but instead the API will generate it and pass it along where it's needed under cover.

This is exactly how I'm doing it. param_info is not even stored anywhere. It is purely an implementation detail, existing only as a temporary variable, bridging PAHFITBase.parse_table() --> param_info --> PAHFITBase.model_from_param_info(param_info).

prefer kind over the names

I'm using the names because they are unique identifiers, mapping components of the astropy model to lines in the features table.

So any reference to that specific curve isn't meaningful.

The current implementation for creating the attenuation component of the model does this

for k in range(len(att_info["names"])):
        if (
            att_info["names"][k] == "S07_att"
        ):  # Only loop through att components that can be parameterized
            model *= S07_attenuation(
                name=att_info["names"][k],
                tau_sil=att_info["amps"][k],
                bounds={"tau_sil": att_info["amps_limits"][k]},
                fixed={"tau_sil": att_info["amps_fixed"][k]},
            )
        else:
            model *= att_Drude1D(
                name=att_info["names"][k],
                tau=att_info["amps"][k],
                x_0=att_info["x_0"][k],
                fwhm=att_info["fwhms"][k],
                bounds={
                    "tau": att_info["amps_limits"][k],
                    "fwhm": att_info["fwhms_limits"][k],
                },
                fixed={"x_0": att_info["x_0_fixed"][k]},

which relies on that specific name to decide between the S07 model and generic drude feature attenuation. A quick fix would be to change the structure of att_info (the last element of param_info).

drvdputt commented 1 year ago

Also, since the features table contains a 'model' column, with 'S07_attenuation' as an example, we should obviously use that one instead of the name to decide which attenuation model to make.

drvdputt commented 1 year ago

Question: is the drude based attenuation model used by anyone? Or can we just remove this functionality? Would make things easier.

jdtsmith commented 1 year ago

Yes, Drude attenuation is used for ice and other absorption in the 2-6µm range. These features have kind=absorption in the new YAML scipack, and there can be arbitrary numbers of them (perhaps grouped). There can be only one kind=attenuation so that should simplify things. Using the names internally to keep track of things is fine, but do not hard-code them, since they are arbitrary. For attenuation, model is future-proofing, for when we have more models available. For now feel free to ignore "model" and just hard-code the S07 curve.

It is honestly a little annoying to have two full columns of the feature table allocated to one feature (attenuation). Open to ideas on this.

jdtsmith commented 1 year ago

 model *= S07_attenuation(...
 model *= att_Drude1D(...

BTW, this is the specific piece of code @ThomasSYLai and I were discussing on Monday: whether model should accumulate by multiplication, or instead tau should accumulate by addition. For now just leave as is.

jdtsmith commented 1 year ago

Also, for power are you just going to treat it as a curve amplitude, or did we converge on PR #225?

drvdputt commented 1 year ago

I was trying to make the att_info object a bit more generic, but based on your above comment (that there can be only one attenuation model), I will keep things as-is. Once we start having more complicated attenuation things, I think it would be easier to re-write the table parsing from scratch, instead of sticking with the param_info method, and having to tailor everything to it.

As far as power goes: I am keeping the meaning of the parameters the same for now, so for now it will stay the same as amplitude.

If a simple value conversion (for input/output purposes) suffices, then it can be implemented in _parse_astropy_result (just as I did for the conversion between fwhm and stddev).

If we want the fit to behave differently, then AreaGaussian and AreaDrude will need to be used (perhaps this is another use for the 'model' column?)

jdtsmith commented 1 year ago

Right, let's go quick and simple to get something working with the new API for now. If #225 models are working, we can switch to those at some point, but not this in PR. Thanks!

drvdputt commented 1 year ago

The easiest thing I could come up with was to add an extra member to param_info. Now att_info has been split up into abs_info and att_info. This makes things easier, because the former are all drudes, and the latter is the one attenuation model we have.

PAHFIT / pahfit

New interface plans #231