PAHFIT / pahfit

Model Decomposition for Near- to Mid-Infrared Spectroscopy of Astronomical Sources
https://pahfit.readthedocs.io/
18 stars 26 forks source link

New guess function #242

Closed drvdputt closed 7 months ago

drvdputt commented 1 year ago

I have replaced the old guess function by one that lives in Model. (old code in PAHFITBase is still there for reference, but no longer used).

Main features:

codecov-commenter commented 1 year ago

Codecov Report

Merging #242 (059d2c1) into dev_yaml-scipack+api (8193bf8) will decrease coverage by 0.54%. The diff coverage is 72.00%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@                   Coverage Diff                    @@
##           dev_yaml-scipack+api     #242      +/-   ##
========================================================
- Coverage                 64.78%   64.24%   -0.54%     
========================================================
  Files                        14       14              
  Lines                      1045     1074      +29     
========================================================
+ Hits                        677      690      +13     
- Misses                      368      384      +16     
Impacted Files Coverage Δ
pahfit/model.py 81.10% <70.00%> (-6.08%) :arrow_down:
pahfit/base.py 65.43% <100.00%> (-2.84%) :arrow_down:
pahfit/features/util.py 58.33% <0.00%> (+8.33%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

drvdputt commented 1 year ago

I made some more changes, and here is an illustration

With the M101 data: the guess Figure 1(3) The fit: Figure 4

With NIRSPEC data Guess: Figure 40 Fit: Figure 42

jdtsmith commented 1 year ago

old code in PAHFITBase is still there for reference

Is the intention to remove this file soon? (Git will always remember).

jdtsmith commented 1 year ago

Is this one necessary prior to dev->main merge, @drvdputt ? I think we discussed how this might be a bit brittle in the face of very different instrument packs, assuming "continuum adjacent to lines", etc. So I feel like it needs more testing. Or is this the very simple guess, translated to interact with the Features table?

drvdputt commented 1 year ago

The only part that may be needed and works relatively well, is the better guess for the lines, to get them to the right order of magnitude.

On the other hand, this will need to change when we have power units. And the dust feature guesses are too sketchy right now, they often result in bad fits for the continuum.

So either

  1. We close this, and I redo this code when we get the units working.
  2. I trim it down, add only the lines part, and remove the old guess code.
drvdputt commented 1 year ago

Or is this the very simple guess, translated to interact with the Features table?

This is an added bonus; we can remove the old guess code from base.py, which interacts with param_info, and instead, we interact directly with with the Features table.

jdtsmith commented 1 year ago

Thanks. I think it's worth having the "simple dumb" guess which addresses the Features table in this, since part of our api is now model.guess(...). So if this can be massaged into that, I'd vote to include it in the dev merge.

drvdputt commented 1 year ago

I've just rebased this on the dev branch and solved the conflicts.

Changes added

  1. Removed "smart" dust guess, and replaced it by the old 0.5 * median guess. I think it would be a good idea to do the same for the lines amplitude guess, but a keyword to activate the "smart line guess" would be nice. I found it useful for very bright lines.
  2. Removed old guess from base.py
  3. Removed _backport_param_info(), which was for converting old guess output to features table

I will do some testing of this function on Monday.

drvdputt commented 1 year ago

There is yet another issue that mentions the problem with the old guess function when the wavelengths of the input data don't have the right range (#262). Based on all the questions I've gotten so far, this is by far the most common bug that people struggle with, so we really need to get this merged. I'll do a quick review of my code soon (and your comments, @jdtsmith).

jdtsmith commented 9 months ago

@drvdputt, is the the main PR you mentioned in Florence as underlying your NIRSPEC Orion model work? Can you quickly go through the comments above and address or close these out? I'd like to get this merged so we can finalize the design of a new internal api #257, which is a high priority.

drvdputt commented 9 months ago

I did a quick check, and most of the comments were already addressed; the code snippets in this thread were just outdated, and I just never replied that I changed things.

I've posted some replies, and resolved everything. This function seems to be working well for my work with the Orion & NGC7023 NIRSpec data.

One more small comment from experience: instead of doing a "smart" guess for the line fluxes, I found that simply calling "model.fit()" twice actually works really well to get the lines "picked up", in cases where only a few lines were actually fit properly. Seems like resetting the minimization from where it left off can really help with some issues.

jdtsmith commented 9 months ago

Can you comment more on your model.fit point? Do you mean you restart the fit at the same parameter values it resulted in during the first fit? To me this sounds like insufficient convergence latitude for the optimizer. Do you think starting line amplitudes closer to their final value is helpful? I know you had a fancier line estimator before (that may not be robust).

drvdputt commented 9 months ago

The "integrate_line_fluxes" option I added definitely helps when the lines peak is several orders of magnitude higher than the other flux values. But sometimes, not all lines are picked up the right way, even with this estimate.

With "fitting twice", I mean using the output of the first fit, as the initial conditions for the second one. This is something I have been doing in my own work, but not very elegant of course. A solution with tweaked fit parameters (or a different algorithm?) would of course be preferable if performance matters.

drvdputt commented 7 months ago

In any case, the "fitting twice works better"-phenomenon can be addressed in a separate issue. I will likely experiment with other fitters while I'm dealing with new spectra in the near future.

The new guess algorithm might still have some issues, but I think we should merge this PR to get the mechanical changes in place. After that, tweaks to the algorithm should be straightforward (whenever we find something that works better.

And equally important: the new guess solves the long-standing wavelength range issue (e.g. https://github.com/PAHFIT/pahfit/issues/260#issuecomment-1516600646)

jdtsmith commented 7 months ago

OK. When we get the Fitter abstract class implemented we'll have to refactor this, but sounds good.