PAHFIT / pahfit

Model Decomposition for Near- to Mid-Infrared Spectroscopy of Astronomical Sources
https://pahfit.readthedocs.io/
18 stars 26 forks source link

Piecewise/partial integration of drvdputt/experimental #280

Open drvdputt opened 2 months ago

drvdputt commented 2 months ago

github.com/drvdputt/pahfit/tree/experimental has diverged quite far from master. But it is in working order, and I am actively using it for my science goals. There are many aspects which we want to integrate into the master branch, but we should do so piece wise to avoid breakage.

Here is an overview of the changes I made so far.

Major features or structures

Other bug / usability fixes

jdtsmith commented 2 months ago

Would be great to pull these into 4 PRs from prompt merging to main:

  1. Internal Fitter API:
  2. Use API:
    • Units.
    • Power-based fitting/reporting.
  3. Feature-table improvements.
  4. Plotting improvements/re-factoring.
drvdputt commented 2 months ago

I copied a selection of comments from this discussion https://github.com/PAHFIT/pahfit/pull/281#issuecomment-2101084989, as a reminder for when I put in the Fitter framework.

I think keep it simple with a string-based keyword with a default (which may change at some point), e.g. fitter='SPFitter'. The model should instantiate an object from this class on init and reuse it IMO (see below). Then it's trivial to "try different fitters". BTW, you could easily make your APFitter have a couple subclasses, APFitterTrustRegion, etc. for the various flavors. Or pass an option to it (see below).

So every fit creates a Fitter? That seems like overkill, and might be an unnecessary slowness. The fitter can surely be instantiated on init and just get asked new questions (fit this, evaluate that, etc.). Some fitters might store intermediate values for speed on subsequent usage (with maybe a way to pass a live_dangerously flag (ok not really that) to the Fitter indicating: "nothing has changed here except the fluxes, feel free to re-use whatever stuff you cached from last time"). This could substantially improve performance for fitting many near identical spectra in a row.

About reusing the model construction in tabulate: That's pretty clever. Then we do need to normalize how Model throws out features that are "too far" (lines 5sigma I recall?). Now that you mention it, I do think it's smart to have this culling happen at the Model level, since fits will not be equivalent if they select different subsets of features to fit. I had/have some logic for this somewhere that was pretty clever, probably in my fitter branch; can take a look.

Now that I understand it, I suppose we don't have to (implement evaluation calls like gaussian() at the Fitter level); keeps the Fitter API simpler ("you must take a Features table with arbitrary number of features of standardized kinds, and do the right thing to evaluate it"). If we ever want live plotting during the fit, we'll need some kind of raw fast_plot, but perhaps each Fitter could implement that on its own using it's own messy guts, if it wants to.

We can take additional **kwargs in Model.__init to pass to the Fitter class (and/or just let users do my_model.fitter.live_plot = True and similar).

You can fit a Model with one instrument and apply it with another, since it is "just a physical model". Spectrum1D's need to have their instruments encoded in meta-data, then the user will "feel no pain", or even really know what happens behind the curtain. We can certainly provide some tooling to help users add instrument metadata to their Spectrum1D's, and encourage other tools to do so using our instrument taxonomy.

A comment on this last thing: If a model is fit to spectrum A, and then another spectrum B is passed with a different instrument / wavelength range, the features culling will need to be different. Which means reconstructing the model. So there are cases where a new Fitter needs to be initialized (or we need to implement editing functionality...). For now, the simplest way is to reconstruct it at every fit regardless (which I do in my experimental branch). If this is a problem with efficiency, we can come up with a smarter way.

jdtsmith commented 1 month ago

So there are cases where a new Fitter needs to be initialized (or we need to implement editing functionality...)

Why does a new fitter always need to be made? Since instrument data are provided "at the last minute" a Fitter needs to be able to adapt if that information changes. I guess feature culling is a difference above Fitter's head.

I do think we should allow (but not require) Fitters_ to "reuse" internal state; see #287.

jdtsmith commented 1 month ago

We can continue discussion at #287. I think this issue has been completed as of 58038bc right?

drvdputt commented 1 month ago

Note of general caution: with the format used in #283 (recently merged into dev), be very careful when assigning values! The right way to access an element of the bounds tuple is e.g. self.features['wavelength']['val'][i], with i the index of a row in the Features table.

I spent a long time trying to figure out what was going wrong while writing the fit results out to the features table. Here's what all the [] operations yield

Something to keep in mind while reviewing the code.

drvdputt commented 2 weeks ago

I have spun up a test pull request for the switch to power units here https://github.com/drvdputt/pahfit/pull/1/files. It's pointed to drvdputt/dev-fitterapi on my own fork, so I can see what is changing. Will be pointed PAHFIT/dev once #289 is merged.