PAHFIT / pahfit

Model Decomposition for Near- to Mid-Infrared Spectroscopy of Astronomical Sources
https://pahfit.readthedocs.io/
18 stars 26 forks source link

Initial YAML-based scipack #210

Closed jdtsmith closed 2 years ago

jdtsmith commented 2 years ago

This is an initial implementation of the YAML-based science pack. At present it is not coupled in any way to the model fitter, so details can be changes, do have a look. Also included is the PAHFIT classic model, encoded as packs/classic.yml. The yaml file is read by a astropy.table.Table inheriting class (Features), so is a table in memory, similar to the current setup. Note that the PAHFIT classic model fixes dust features (wavelength or FWHM). Also note that value + bounds are grouped under a single column name as an (n, 3) column-array. Going forward astropy 5.1 will have better formatting of these.

To test/play:

from features import Features
t = Features.read('classic.yml')
drvdputt commented 2 years ago

After playing around, here are some tings that I found to be working or not working.

What works as expected

comment out sub-features (e.g. some H2 line), e.g.

H2_lines:
    kind: line
    wavelength:
        H2_S(2):   12.2785
        # H2_S(1):   17.0346
        # H2_S(0):   28.2207

comment whole feature block (e.g. all H2 lines, a PAH feature)

# H2_lines:
#    kind: line
#    wavelength:
#        H2_S(7):    5.5115
#        H2_S(6):    6.1088
#       H2_S(5):    6.909107

Comment at end of temperature list e.g.

# Modified Blackbodies
    kind: dust_continuum
    temperature: [300, 200, 135, 90] #65, 50, 40, 35] 

These things produce sensible errors:

Cryptic or no error

Errors out cryptically Comment out all H2 lines, but not the block itself (error happens in astropy column, during call to setting the format at features.py:289)

H2_lines:
    kind: line
    wavelength:
        # H2_S(2):   12.2785
        # H2_S(1):   17.0346
        # H2_S(0):   28.2207

Removing floating point number for one of the H2 wavelengths results in (TableMergeError: The 'wavelength' columns have incompatible types: ['object', 'float64’]).

H2_lines:
    kind: line
    wavelength:
        H2_S(2):  
        H2_S(1):   17.0346
        H2_S(0):   28.2207

Silent failure Commenting out starlight / temperature results in table with blank temperature field

starlight:
    kind: starlight_continuum
    # temperature: 5000
codecov-commenter commented 2 years ago

Codecov Report

Merging #210 (21274ad) into master (7090b71) will increase coverage by 1.06%. The diff coverage is 68.42%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
+ Coverage   51.48%   52.54%   +1.06%     
==========================================
  Files           8        9       +1     
  Lines         538      569      +31     
==========================================
+ Hits          277      299      +22     
- Misses        261      270       +9     
Impacted Files Coverage Δ
pahfit/base.py 55.18% <62.50%> (+1.02%) :arrow_up:
pahfit/component_models.py 90.38% <100.00%> (+0.80%) :arrow_up:
pahfit/errors.py 100.00% <100.00%> (ø)
pahfit/scripts/run_pahfit.py 16.66% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 53ca3c2...21274ad. Read the comment docs.

jdtsmith commented 2 years ago

Thanks for the close look, @drvdputt. I pushed a fix that translates missing values into masked entries. It also insists that in a group with lists, all non-group-wide values must be lists. This fixes "one missing list" (which you didn't show but came up when playing with your examples).

A simple:

H2_lines:
    kind: line
    wavelength:
        # H2_S(2):   12.2785
        # H2_S(1):   17.0346
        # H2_S(0):   28.2207

or

starlight:
    kind: starlight_continuum
    # temperature: 5000

is actually allowable: it specifies a new line feature with no parameters set. I suppose we could generate a warning in that case, but this may be useful during the training-ramp to put "stubs" in, then alter the Features table programmatically.

jdtsmith commented 2 years ago

The other changes you might look at:

jdtsmith commented 2 years ago

One other thing you may notice among the recent pushes is rolling back an attempt to use a structured array for bounded parameter columns. I was hopeful this would work out after some discussions with astropy devs, but a few blocking bugs —astropy/astropy#13267, astropy/astropy#13271 — means this won't work for at least a couple point versions. This would mostly be a formatting convenience so you could see 12.3 (12.28, 12.33) for a bounded param when printing the table. If you have any other clever ideas for this let me know. Update: managed to make this work using a simple (n,3) list to represented bounded variables — (value, min, max).

Dhruvil2911 commented 2 years ago

Just a minor error - The Fe[II] lines under the ionic lines in the classic.yml file (line number 37) need to have different keys (names), just as done for S[III] since YAML doesn't allow same names.

jdtsmith commented 2 years ago

Just a minor error - The Fe[II] lines under the ionic lines in the classic.yml file (line number 37) need to have different keys (names), just as done for S[III] since YAML doesn't allow same names.

Thanks Dhruvil, fixed. I also took the opportunity to patch the YAML loader so it throws an error if duplicate keys are used in one dict.

jdtsmith commented 2 years ago

For people trying to follow along or construct their own YAML-based scipacks, the draft docs are updated now: https://github.com/PAHFIT/pahfit/wiki/PAHFIT-2022#science-packs.

jdtsmith commented 2 years ago

I'd like to merge this soon. One last detail I'm considering: remove fwhm entirely as a supported science pack parameter input for kind line. What does a "single" FWHM for a particular unresolved line even mean? That's the domain of the instrument pack, and in fact the same line can have multiple different FWHM's, depending on which segments it contributes to. So putting it in the Features table, as if there is one and only one value, makes very little sense. It's simply not a scientific detail.

But then, during tuning, how do we conveniently "train" the instrument pack? I.e. allow FWHM to vary for a given line + spectral segment? And how do we get that information out of the fit? Maybe just give a flag to the fitter model, like return_fwhm (False, by default), and pass only a single segment at a time for this type of FWHM training. Then we can create some training tool to:

  1. Initialize lines fwhm based on the instrument model (for a single segment only!).
  2. Add desired bounds (± 1%, say).
  3. Run the fit on a SINGLE segment. The line fwhm values will override the instrument pack.
  4. Examine the resulting fitted fwhm values, and tune up the instrument pack coefficients accordingly.
els1 commented 2 years ago

In response to JD's comments above (https://github.com/PAHFIT/pahfit/pull/210#issuecomment-1164540120): I agree to remove the FWHM for the unresolved lines and proceed as you propose. It doesn't make sense to keep it as it is set by the instrument pack.

jdtsmith commented 2 years ago

I agree to remove the FWHM for the unresolved lines and proceed as you propose. It doesn't make sense to keep it as it is set by the instrument pack.

Good; done. Note that later we may add a "velocity_sigma" parameter or similar to lines, and dust_features, so as to allow them to be a "bit wider" than the instrument pack specifies. I guess this could be set on any individual or group of lines/dust_features.

jdtsmith commented 2 years ago

Anything else before we merge this all? @Ameek-Sidhu let me know if your single feature parameter with bounds issue is fixed (should be). Getting this in is the precursor to hooking up the new YAML pack flow, and "hiding" some of the fitting details behind the pahfit model object.

jdtsmith commented 2 years ago

Fixes @Ameek-Sidhu's issue. Tests for pack ingestion TBD with #218.