PAHFIT / pahfit

Model Decomposition for Near- to Mid-Infrared Spectroscopy of Astronomical Sources
https://pahfit.readthedocs.io/
18 stars 26 forks source link

Features: read support for `tied` parameters and group; and working "units" package #241

Closed jdtsmith closed 1 year ago

jdtsmith commented 1 year ago

This PR (to dev branch) adds the ability to read `tied' features:

            power:
                tied:   # A one-to-many sum tie (varying ratio), line to DFs
                    feature: stable_dfs
                    ratio:
                        value: 1.25
                        bounds: [-0.25#, 0.25#]

tied can appear per feature parameter, or at the group level (for many-to-* ties).

The tied constraint information is kept in features.meta[_ratios]. Since the latter cannot serialize the np.ma.masked constant, these are set to None (e.g. for missing bounds). Note that this is as of yet only read support; the current model does not yet support tied features.

codecov-commenter commented 1 year ago

Codecov Report

Merging #241 (bee5926) into dev_yaml-scipack+api (fa7d296) will decrease coverage by 1.48%. The diff coverage is 39.32%.

@@                   Coverage Diff                    @@
##           dev_yaml-scipack+api     #241      +/-   ##
========================================================
- Coverage                 63.02%   61.53%   -1.49%     
========================================================
  Files                        14       14              
  Lines                      1025     1079      +54     
========================================================
+ Hits                        646      664      +18     
- Misses                      379      415      +36     
Impacted Files Coverage Δ
pahfit/features/features.py 64.48% <39.32%> (-8.81%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jdtsmith commented 1 year ago

We can leave this for after dev merge.

jdtsmith commented 1 year ago

I'll retarget this for main once dev is merged.

drvdputt commented 1 year ago

I would be OK with merging this today, IF the code in this pull request was actually in a working state.

There seems to be something wrong with tox (for all pull requests), so the tests didn't run here. But I ran them offline, and almost all of them fail with the same error. Here's the error of one of them.

pytest
====================================== test session starts =======================================
... output bla bla ...
pahfit/tests/test_feature_parsing.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pahfit/features/features.py:181: in read
    return cls._read_scipack(file)
pahfit/features/features.py:298: in _read_scipack
    return cls._construct_table(feat_tables)
pahfit/features/features.py:428: in _construct_table
    col.unit = PARAM_UNITS[cn]
../../opt/miniconda3/envs/pahfit/lib/python3.10/site-packages/astropy/table/column.py:1184: in __setattr__
    super().__setattr__(item, value)
../../opt/miniconda3/envs/pahfit/lib/python3.10/site-packages/astropy/table/column.py:930: in unit
    self._unit = Unit(unit, parse_strict='silent')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <class 'astropy.units.core.Unit'>, s = <UNITS.temperature: Unit("K")>, represents = None
format = None, namespace = None, doc = None, parse_strict = 'silent'

    def __call__(self, s="", represents=None, format=None, namespace=None,
                 doc=None, parse_strict='raise'):

        # Short-circuit if we're already a unit
        if hasattr(s, '_get_physical_type_id'):
            return s

        # turn possible Quantity input for s or represents into a Unit
        from .quantity import Quantity

        if isinstance(represents, Quantity):
            if is_effectively_unity(represents.value):
                represents = represents.unit
            else:
                represents = CompositeUnit(represents.value *
                                           represents.unit.scale,
                                           bases=represents.unit.bases,
                                           powers=represents.unit.powers,
                                           _error_check=False)

        if isinstance(s, Quantity):
            if is_effectively_unity(s.value):
                s = s.unit
            else:
                s = CompositeUnit(s.value * s.unit.scale,
                                  bases=s.unit.bases,
                                  powers=s.unit.powers,
                                  _error_check=False)

        # now decide what we really need to do; define derived Unit?
        if isinstance(represents, UnitBase):
            # This has the effect of calling the real __new__ and
            # __init__ on the Unit class.
            return super().__call__(
                s, represents, format=format, namespace=namespace, doc=doc)

        # or interpret a Quantity (now became unit), string or number?
        if isinstance(s, UnitBase):
            return s

        elif isinstance(s, (bytes, str)):
            if len(s.strip()) == 0:
                # Return the NULL unit
                return dimensionless_unscaled

            if format is None:
                format = unit_format.Generic

            f = unit_format.get_format(format)
            if isinstance(s, bytes):
                s = s.decode('ascii')

            try:
                return f.parse(s)
            except NotImplementedError:
                raise
            except Exception as e:
                if parse_strict == 'silent':
                    pass
                else:
                    # Deliberately not issubclass here. Subclasses
                    # should use their name.
                    if f is not unit_format.Generic:
                        format_clause = f.name + ' '
                    else:
                        format_clause = ''
                    msg = ("'{}' did not parse as {}unit: {} "
                           "If this is meant to be a custom unit, "
                           "define it with 'u.def_unit'. To have it "
                           "recognized inside a file reader or other code, "
                           "enable it with 'u.add_enabled_units'. "
                           "For details, see "
                           "https://docs.astropy.org/en/latest/units/combining_and_defining.html"
                           .format(s, format_clause, str(e)))
                    if parse_strict == 'raise':
                        raise ValueError(msg)
                    elif parse_strict == 'warn':
                        warnings.warn(msg, UnitsWarning)
                    else:
                        raise ValueError("'parse_strict' must be 'warn', "
                                         "'raise' or 'silent'")
                return UnrecognizedUnit(s)

        elif isinstance(s, (int, float, np.floating, np.integer)):
            return CompositeUnit(s, [], [], _error_check=False)

        elif isinstance(s, tuple):
            from .structured import StructuredUnit
            return StructuredUnit(s)

        elif s is None:
            raise TypeError("None is not a valid Unit")

        else:
>           raise TypeError(f"{s} can not be converted to a Unit")
E           TypeError: UNITS.temperature can not be converted to a Unit

../../opt/miniconda3/envs/pahfit/lib/python3.10/site-packages/astropy/units/core.py:2059: TypeError
drvdputt commented 1 year ago

I also expect more errors to pop up because now the columns suddenly have units. Might need to do .value in a couple of places in the Model code.

Edit: Actually, not just .value. As long as the refactor of the components isn't there yet, the conversions to the right flux unit need to happen. So something like

amplitude = (flux_power [in UNITS.flux_power] / fwhm [in UNITS.fwhm]).to(UNITS.flux_density).value
jdtsmith commented 1 year ago

Well what I'm thinking is that (pure, but impure for now too) Model will:

  1. Determine if we are doing surface brightness or flux density, ala (no mixing! -> error):
    uparts = sp.flux.unit.decompose()
    isSB = (u.rad, -2) in zip(uparts.bases, uparts.powers)
  2. Convert all incoming Spectrum1D's internally to the correct working unit (using your new_flux_unit() discovery).
  3. Record the correct "working unit" from units.py in the appropriate Features table columns (power, tau for BB's).

All/both our ModelCore backends will naturally expect to work in the working UNITS, so there will be no ambiguity of conversion for them to do themselves. The features table will be updated with OUR units. And we will document how to convert any given table/sum/etc. into other units. You can also let me know what you think about my chosen "default working units". Smallish numbers are VERY NICE for optimizers in my experience, whereas 1e-22: not so good.

jdtsmith commented 1 year ago

The bug above was because Enum's seem dumb: you have to say Class.member.value to get at the actual "thing being held". Happy to switch to some other data structure if you have an idea (I mean a dict, but hmm. NamedTuple?).

drvdputt commented 1 year ago

Looking at the latest changes, I'll assume you removed the FWHM units because they should be the same as wavelength anyway.

I was about to ask why you were using an enum here. Enums are types for variables that have a finite set of options.

Do we even need a data structure for this? We could just have constants at the top level of the units module. Alternatively, if we still want this UNITS namespace, I think SimpleNamespace would be the best option.

from astropy import units as u
from astropy.units import CompositeUnit
from types import SimpleNamespace

UNITS = SimpleNamespace(
    temperature=u.K,
    wavelength=u.um,
    flux_density=u.mJy,
    flux_power=CompositeUnit(1e-22, (u.W, u.m), (1, -2)),
    solid_angle=u.sr,
    intensity=u.MJy / u.sr,
    intensity_power=CompositeUnit(1e-10, (u.W, u.m, u.sr), (1, -2, -1)),
)
jdtsmith commented 1 year ago

OK, duh. Too much C programming in my background. Module constants it is. BTW, do you have an idea why all the tests are failing? I keep seeing:

py38-alldeps: failed with pass_env values cannot contain whitespace, use comma to have multiple values in a single line, invalid values found 'HOME WINDIR LC_ALL LC_CTYPE CC CI TRAVIS'
drvdputt commented 1 year ago

tox issue

I found the problem. It has something to do with tox, because the error is this

py38-alldeps: failed with pass_env values cannot contain whitespace, use comma to have multiple values in a single line, invalid values found 'HOME WINDIR LC_ALL LC_CTYPE CC CI TRAVIS'

and tox.ini has this

# Pass through the following environment variables which may be needed for the CI
passenv = HOME WINDIR LC_ALL LC_CTYPE CC CI TRAVIS

I found this thread from 5 days ago https://github.com/tox-dev/tox/issues/2676

Apparently a breaking change going from tox 3 to tox 4 https://tox.wiki/en/latest/faq.html#tox-4-changed-ini-rules, and the continuous integration is indeed installing tox 4:

Collecting tox
  Downloading tox-4.0.11-py3-none-any.whl (142 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.1/142.1 kB 4.3 MB/s eta 0:00:00

Even though the documentation states passenv=SPACE-SEPARATED-GLOBNAMES link, they were actually supposed to be comma or newline separated

PAHFIT tests

I tested this branch using the tests (running locally) and the demo notebook, and everything seems fine. The new table header looks like this when saved with the embedded units.

# %ECSV 1.0
# ---
# datatype:
# - {name: name, datatype: string}
# - {name: group, datatype: string}
# - {name: kind, datatype: string}
# - {name: temperature, unit: K, datatype: string, format: 0.4g, subtype: 'int64[3]'}
# - {name: tau, datatype: string, format: 0.4g, subtype: 'float64[3]'}
# - {name: wavelength, unit: um, datatype: string, format: 0.4g, subtype: 'float64[3]'}
# - {name: power, datatype: string, format: 0.4g, subtype: 'float64[3]'}
# - {name: fwhm, unit: um, datatype: string, format: 0.4g, subtype: 'float64[3]'}
# - {name: model, datatype: string}
# - {name: geometry, datatype: string}

So far no issues, because the old code in base.py does .data on every column, and therefore ignores whether there are units or not.

karllark commented 1 year ago

For the tox issue - adding commas between the variables in passenv fixes the issue. Just tested this in another repository.

jdtsmith commented 1 year ago

You guys are too good. The SB vs flux unit dichotomy reminds me: if a model features table is in SB and the user tries to reuse that model to fit other spectra with flux units (or vv), we should error.

jdtsmith commented 1 year ago

@drvdputt can you submit a quick tox fix to get past this error? Let's get a plan together to merge dev to main soon; especially important since that's what you demoed last week.

drvdputt commented 1 year ago

I've just posted #253. Once that's merged, you can rebase this pull request branch onto it, and we'll see if the tests work.

jdtsmith commented 1 year ago

We can hold off on this until dev hits main. I'll also separate the tied and "default units" stuff into two PRs so we can evaluate the latter for issues. We do need to clarify the units situation, since our science packs have some implicit units already (e.g. microns).

@drvdputt are there any particular units-related issues you think the "default unit" portion of this PR this will hit? If so, perhaps discuss them in #53.