bumps / bumps

Data fitting with uncertainty analysis
Other
52 stars 35 forks source link

Consider adding support for periodic or reflective boundary conditions - maybe add a flag in the new dataclass schema #64

Open acaruana2009 opened 3 years ago

acaruana2009 commented 3 years ago

Some parameters can be periodic - say a magnetic angle (PA reflectivity) or the phase of a sinusoid. Fitting from say 0 to 2π can give multiple solutions (depending on the model symmetry), in this range. E.g. this can result in DREAM producing half of the posterior at 0 and the other half just before 2π.

This can be solved simply by running the fit once to see where the symmetry is, and adjusting the boundaries so it does not include the other solutions, and centring the new fit boundaries around it. However, from a user perspective, it might be nice to specify a periodic boundary to allow DREAM to do this for the user, such that the user does not have to run multiple fits, to narrow down the prior range to only include one mode of the posterior.

For now I will document use cases where this appears and place them in this ticket. If we feel there is enough use for it then we can consider how to implement it.

pkienzle commented 3 years ago

You should be able handle periodic boundaries pretty transparently across fitters. Just apply the mod to the periodic parameters in setp before calling the model evaluator. The optimizer would still think it was using the "unmodded" value, so at the end of optimization you would need to mod whatever result was returned from the fit. Diddle the bounds a bit before starting so the fitter has a whole period on either side of the starting point in case the branch cut is near the solution.

Finding the shortest credible interval will be slightly different from the non-periodic condition. Start from the inverse cumulative density function as usual, but slide a window of 5% cumulative density along the inverse curve to pick the largest delta instead of the 95% window. This gives the part of the period that will be excluded from the credible interval. Echo 5% from the start of the distribution after the end, adding one period to the phase, to capture excluded intervals that cross the branch cut.

The inverse cdf is approximated with a simple sort. Divide the ordinal position by the number of samples to form the x-axis and use the sample value itself as the y-axis. When moving the first 5% to the end, add the length of the period to the sample value, resulting in a "cdf" that goes from 0 to 105%. The cost is O(n log n + k + n) after the sort, difference and max, where k = n·5%. This is only slightly more expensive the O(n log n + k) for non-periodic conditions.