astropy / astropy-APEs

A repository storing the Astropy Proposals for Enhancement.
Other
35 stars 36 forks source link

WIP: Enhance Modeling to Support Probabilistic Fitting #73

Closed karllark closed 11 months ago

karllark commented 2 years ago

We propose to enhance astropy.modeling to support general probabilistic techniques in addition to the currently supported fequentist philosophy. This would include fully supporting priors as part of the model, fitting using a more general Bayesian-compatible approach to the objective function (i.e., prior times likelihood), and adding support for sampling techniques to fitting.

dokester commented 2 years ago

About a week ago, I send in my views on merging BayesicFitting (BF) with Astropy (AP). I don't see much of it reflected in this discussion. (For good measure I will put the document I wrote into the next comment.) What I do see, asks for a major overhaul of BF to become part of the astropy.modeling ecosystem. I think that is an ineffective and even dangerous proposal. It is ineffective because numerous pieces of software need to be rewritten to fit in the new ecosystem. It is dangerous because new software creates new errors and breaks the solutions I found long time ago for certain design issues. Remember that BF is the result of almost 20 years of designing, writing, using, updating, polishing and bug removal. I don't say it is error free, but the most conspicuous ones are eliminated in the long years of usage. I would like to stress again that almost all features in BF, were written out of necessity: it was needed at a certain moment for a particular purpose. Almost all of BF has been successful on real applications and using real data. It is unwise to throw all this experience, encoded in software, away.

Let me present one example of model building. The class "Model" is central in fitting, sampling and model building. All Fitter classes and the Samplers interact with Model. For that purpose Model has 3 methods. To calculate the result of the model, F(x,p), and to calculate the partial derivatives dF/dp and dF/dx. These methods have 2 arguments, the first for (multidimensional) inputs and the second for a list/array of parameters. When more models are combined into a compound model, it is still the same "Model" class and the 3 methods are present, calculating the results and partials for the compound model, irrespective of compounding operation or dimensionality of inputs or outputs. The new (compound) Model can partake in further model building and is always usable in the Fitters and Samplers.

In astropy.modeling this is only partially present. A FittableModel has 2 of the 3 necessary methods. The dF/dx is missing; it is needed in the pipe "|" operation and for problems where there are errors on X and Y. The interfaces of the present methods lists first all inputs (one per input dimension) and then all individual parameters. This is fine for small problems, but as soon as the number of parameters gets above a handful, it starts to become unwieldy. Combining 2 models, yields a CompoundModel, which is not a FittableModel.

Building the BF possibilities into AP modeling, would be a major effort, that would still break the present functionality of AP. E.g. CompoundModel can not stay; it need to be replaced by recursively calling FittableModel. Also I think that the interfaces of evaluate() and fit_deriv() have to be changed, or they have to be duplicated. Anyway a lot of work for very little benefit as all that would be gained is already present in BF. This is not my idea of spending my time.

In the proposal there is a call for priors, statistics (error distributions) etc. Immediately we see how easy it is to make mistakes. Priors and error distributions need to be proper distributions that integrate to 1.0. So when we have a uniform prior between a and b, the value over that range should be 1/(b-a). A prior can not be just any function. It needs at least to integrate to 1.0. And it should reflect the knowledge of the user about the data at hand. The same holds for the error distributions. For each individual residual the distribution need to integrate to 1. Otherwise the resulting evidences are not OK. An evidence is a well-defined number, given the model, data, priors and error distribution. In principle, the evidence of team A, should be comparable with that calculated by team B, given the same problem. In practice that is not so likely to happen. More common is the comparison between two models. Even then it is important that the evidences are carefully calculated to draw to right conclusions.

I stand by my proposal explained in my earlier document (in the next comment)

dokester commented 2 years ago

astropy.txt

karllark commented 2 years ago

@dokester: Thanks for your thoughts. I find deciding how to proceed on this topic not to be easy and there are pros/cons to all the choices. I look forward to the discussion at the session later today.

hamogu commented 2 years ago

@dokester Thank you for your insight. This is marked as a draft, so we are discussing what to do with it. I do like some of your suggestions, but it's not clear to me that "just integrate BF" is the right goal to aim for. There are a number of other fitting packages that are also well represented in the community, have been developed over years, and are battle-tested. The one I'm contributing to in that category is sherpa with about a thousand refereed citations.

So, I personally (but I'm ready be convinced to do things different in our discussion) think we should have a well thought-out interface that allows users to use astropy.modelling - which do have features not present in other fitting packages, namely the support for physical units - with any fitting / sampling package of their choice.

All those fitters / optimizers / samplers / ... need the same information, because they apply similar math. If we do this job well, it won't be hard to write that interface for any package, but, of course, if e.g. nobody wants to write the interface between astropy and package X, then package X might not support that feature. So, in my view, this is not to make @dokester as BF developer more work, but to take the experience from several different package developers to make a good and generic interface.

dokester commented 2 years ago

During the zoom meeting Karl asked me for an example of a model that could not be done with astropy.modeling.

I present here an annotated excerpt from the actual script which calculated the flatfield for the JWST-MIRI instrument.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

## 4. Construct fringe model

# Define a splines model to address the (small) deviations in the wavenumber scale of the data
# 5 knots evenly distributed over the wavenumber space.
sm = BasicSplinesModel( nrknots=5, min=0, max=wmax-wmin )

# Add the wavenumbers to the deviations. 
# This PowerModel here has no parameters. It is defined as F(x:p) = x
wm = sm + PowerModel( 1, fixed={0:1.0} )

# Set Laplace priors for the 7 parameters defining the deviations 
# with a very small range to make sure it is only small deviations.
for k in range( wm.npars ) :
    wm.setPrior( k, prior=LaplacePrior( center=0, scale=0.01 ) )

# define BasicSplines for the amplitide and the finesse of the main etalon
pm1 = BasicSplinesModel( nrknots=3, min=0, max=wmax-wmin )
pm2 = BasicSplinesModel( nrknots=3, min=0, max=wmax-wmin )

# Define 2 etalons.
# EtalonModel encodes:  F(x:p) = p_0 / ( 1.0 + p_1 * sin^2( \pi p_2 x + p_3 ) )
# In the main etalon replace p_0 (amplitude) by the result of pm1 
# And p_1 (finesse) by the result of pm2
em1 = EtalonModel( fixed={0:pm1, 1:pm2} )

# In the second etalon fix the amplitude (p_0) to 1.0 (to avoid degeneracy)
em2 = EtalonModel( fixed={0:1.0} )

# Construct a 2-etalon model as the product of both
et2mdl = em1 * em2

# Set initial values for the parameters of et2mdl. 
# They originate from an earlier calculation.
# They are only needed as center values of the priors.

## re-order parameters in new et2mdl
##        0     1        2:7       7:12        12    13     14
##      peri   phas      amp       fin        fin  peri   phas
etpar = [p[2], p[3]] + [p[0]]*5 +[p[1]]*5 + [p[4], p[5], p[6]]

# Set Laplace priors to each of the parameters of et2mdl
for k in range( et2mdl.npars ) :
    scl = 0.03
    # circular for the phases in parameter nrs 1 and 14
    if k == 1 or k == 14 :
        et2mdl.setPrior( k, prior=LaplacePrior( center=etpar[k], scale=0.1, circular=math.pi ) )
    # Strictly positive Laplace for the finesses
    elif 2 <= k <= 12  :
        et2mdl.setPrior( k, prior=LaplacePrior( center=etpar[k], scale=scl, limits=[0,None] ) )
    # All others         
    else :
        et2mdl.setPrior( k, prior=LaplacePrior( center=etpar[k], scale=scl ) )

# pipe the wavelength correction model through the double etalon model
mdl = wm | et2mdl

# mdl is now the model. 
# It has 21 parameters, with the necessary priors, to be fed to a Sampler  

## 5. Fit the model

# Define a somewhat faster version of NestedSampler. 
# Use only the GalileanEngine to move the walkers around
# Use the LaplaceErrorDistribution to get median-like results. Better with non-Gauss errors.
engs = "galilean"
ns = PhantomSampler( wn0, mdl, flux, weights=cwgt, verbose=2,
                engines=engs, distribution="laplace" )

# Make the scale parameter of LaplaceErrorDistribution fittable too. 
# Using the default: JeffreysPrior with indicated limits. 
ns.distribution.setLimits( [0.001, 0.1] )

# Introduce a constrain method to keep the periods from collapsing on one value.
ns.distribution.constrain = constrainDistance

# Start the Sampler.
loge = ns.sample()

# The result is defined as a list of samples
slist = ns.samples

# The constraint keeps parameter 7 larger than parameter 20 
def constrainDistance( logL, problem, allpars, lowLhood ):
    return logL if allpars[7] > ( allpars[20] + 0.2 ) else ( lowLhood - 1 )

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

It is quite complicated but the result is that we have a physical model for the flat field of the MIRI instrument. In my view much better than an adhoc average. It has only 21 parameters. Still it fitted the data so good that no structures were left in the residuals.

pllim commented 1 year ago

Are the interested parties willing to revisit or is this a no-go?

karllark commented 1 year ago

I am willing to revisit. I would like to request leadership input/direction. We were not able to come to a consensus during the meeting about 1 year ago.

pllim commented 1 year ago

cc @WilliamJamieson , @nden , @perrygreenfield

perrygreenfield commented 1 year ago

I'm going to take some time to review this issue.

WilliamJamieson commented 1 year ago

I also need some time to re-review this.

perrygreenfield commented 1 year ago

To address one of @dokester concerns, namely the lack of partial derivatives for compound models, I think we can add those also to astropy modeling as well without a huge amount of work.

perrygreenfield commented 1 year ago

I have to leave at the moment, but I agree with @dokester that we should not expect them to rework BF to merge into modeling. But we should look to see if we can make astropy modeling acquire the features that BF has.

pllim commented 1 year ago

I have sent an email out to interested parties asking for update.

pllim commented 1 year ago

Hello. CoCo discussed this again on 2023-09-11, we will give the interested parties approximately two weeks to come up with a plan here (either you want to rewrite this APE, develop the features as a separate package not in astropy, or whatever). After that period, CoCo will either merge or close this PR, as it has been open long enough without any concrete follow-up actions.

As it stands (if no one comments at all with a plan), CoCo will reject this proposal.

Two weeks from today is 2023-09-25.

hamogu commented 1 year ago

I want to note that "reject"does not mean that this is not valuable work. It just means "we don't endorse the full package of this to be part of astropy core right now". For example, if this starts its live in a separate package, I would not be surprised if there is another APE in a few years time that suggests to merge that new package into astropy core after it has matured a bit. The new APE could then go back and copy and paste a large amount of text from this one.

On a personal note, I think that developing and releasing on a faster cycle then astropy core would be a good solution to iterate a little faster and get use cases that help to iterate on the API before it's bound to astropy core rules that require long deprecation periods for any change.

hamogu commented 11 months ago

Since we have not heard from the developers of this APE, we are rejecting this as an APE at this time. This would still be a great thing to get into astropy, but developing it independently for now seems like the better option. A new APE can be written (or this one re-opened) when it's ready to go into astropy core at a later time.

Thanks for everyone taking part in the discussion - I hope this will still useful for the development, even if not as an APE.

karllark commented 11 months ago

This is fine. Working on this in a separate package is my current plan. I will expand on my existing prototype implementation and we can see where it goes from there. Others on this APE may contribute to this separate package if they have time and motivation.