Exo-TiC / ExoTiC-ISM

This is a repository for the reduction pipeline detailed in Wakeford, et al., 2016, ApJ. The method implements marginalization across a series of models to represent stochastic models for observatory and instrument systematics. This is primarily for HST WFC3, however, may be extended to STIS in the future.
MIT License
8 stars 6 forks source link

Make minimal code example to test the Sherpa fit vs. IDL mpfit #59

Closed ivalaginja closed 4 years ago

ivalaginja commented 5 years ago

While working on issue #22, where we want to test G102 data, we found differences between the results that our package gives vs running the same thing in the IDL codes. While we have accepted some minimal discrepancy between the IDL and the Python results while building the Python package, we need to investigate to what extent we can really do that, or whether there is a fundamental difference in how the two code bases do the fitting.

To do that, we will set up the easiest possible example with either a Gaussian or a sine wave where we simulate noisy data and then try to fit it both with Sherpa and the IDL mpfit. This should help us in determining what the statistical differences are between the two routines.

I will first go ahead and set up an example in a notebook with sherpa, maybe we can recycle the sine wave fitting example we made when looking into Sherpa the first time. Then I'll try to set up an equivalent model in IDL, but might reach out to @hrwakeford and @Onoddil about it. Will keep you in the loop.

ivalaginja commented 5 years ago

I think the sine wave notebook (Sherpa on simple sine custom model.ipynb) should work fine for this. I'll start working on the IDL counterpart.

ivalaginja commented 5 years ago

@hrwakeford could you have a look at the IDL file I made and adjust it so that it runs the MPFIT fitting? I am having a hard time figuring out how to set it up exactly. I am trying to replicate the Sherpa sine notebook in the above comment.

You can check out branch exotic-59_mpfit-comparison and in the notebooks folder, there's the file I worked on called sine_wave.pro. I have just copied the old fitting routine into there too and called the file G141_routine.pro to be able to look at the example and have the mpfit function in there.

It might also be worth it to sit down next to each other again, one on the IDL codes, the other on the Python version, and going through them line by line to make sure that they are indeed doing exactly the same thing. We can always go back to the notebooks we used to compare the outputs from both, step by step.

ivalaginja commented 5 years ago

Thinking about this, it might also be worth to compare Sherpa to the Python version of mpfit, since that would be in the same language, but have the same implementation as the IDL one. We might learn something from it.

ivalaginja commented 5 years ago

Another thought is to go through all the detailed parameters that define the optimizer (see mpfit documentation in routine directly and Sherpa's on their docs) and explicitly set them to be the same. Their defaults might be different, as we have already seen with the parameter epsfcn (see this comment and following) and adjusting it improved our results a lot.

ivalaginja commented 5 years ago

I didn't quite read through all of this yet, but it might also be useful information: https://github.com/sherpa/sherpa/issues/679#issuecomment-531242318

If the differences we're seeing are due to implementation differences, this just means that an optimization of the fitting routines has to become a standard point on the list of things to work on for this repository.

Onoddil commented 5 years ago

It would be good to add a 1D gaussian dataset as a test, as fitting a dataset drawn from a given 1D Gaussian back to a 1D Gaussian should also be analytically solvable, both in terms of the maximum likelihood estimator parameters and their corresponding uncertainties, which should give a direct comparison of the uncertainties that come out of the lower limits from the Fisher Information Matrix from Sherpa.

Also before the more comprehensive and realistic tests, adding the original sin wave tests back into the testing environment formally would be good, as a verification of the fitting process with an obvious, well documented test.

hrwakeford commented 5 years ago

Tasks:

For each test use the following: Set the x-array to be in terms of phase with 0.0 as the center of transit inclination = 90' degrees Period = 3.5 days a/R* = 7.0 Transit depth (Rp/Rs) = 0.1

Test 2 to see if the limb-darkening influences the fit parameters: Use the 3D model grid over wavelength range of 1.1 - 1.7 microns with the G141 grating Teff = 5500 K metallicity = 0.0 logg = 4.5

Test 3 to determine if the systematic model influences the fit parameters: This will test fitting for an m_fac parameter, which can be done by applying a linear slope line = (xarray*0.04) + 1.0 where m_fac = 0.04

hrwakeford commented 5 years ago

Test 0: Simple dataset with no additional scatter with the following set-up

; Constants
; ----------
Gr = 6.67259d-11
dtosec = 86400

; Planet parameters
; ------------------
inclin = 90d0
period = 3.5d0
a_Rs = 7.0d0
rp_rs = 0.1d0

inc_rad = inclin * ((2*!pi)/360D0)
Per = period * dtosec
constant1 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0)
msmpr = (a_Rs/(constant1))^3d0

c1 = 0.0d0
c2 = 0.0d0
c3 = 0.0d0
c4 = 0.0d0
m = 0.0d0

data_x = [-0.046, -0.044, -0.042, -0.040, -0.038, -0.036, -0.034, -0.032, -0.030, -0.006, -0.004, -0.002, 0.0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.032, 0.034, 0.036, 0.038, 0.040, 0.042, 0.044, 0.046,0.048]

data_y = [1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000]

uncertainty = REPLICATE(0.0004, n_elements(data_x))

Allowing only rp_rs to be fit using MPFIT the least-squares minimizer produces a result of:

Rp/R* = 0.10000000 +/- 0.00066666679 Reduced chi-squared = 0.0000000

Test0

hrwakeford commented 5 years ago

Test 1: Simple dataset with additional scatter and the following set-up

; Constants
; ----------
Gr = 6.67259d-11
dtosec = 86400

; Planet parameters
; ------------------
inclin = 90d0
period = 3.5d0
a_Rs = 7.0d0
rp_rs = 0.1d0

inc_rad = inclin * ((2*!pi)/360D0)
Per = period * dtosec
constant1 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0)
msmpr = (a_Rs/(constant1))^3d0

c1 = 0.0d0
c2 = 0.0d0
c3 = 0.0d0
c4 = 0.0d0
m = 0.0d0

data_x = [-0.046, -0.044, -0.042, -0.040, -0.038, -0.036, -0.034, -0.032, -0.030, -0.006, -0.004, -0.002, 0.0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.032, 0.034, 0.036, 0.038, 0.040, 0.042, 0.044, 0.046,0.048]

uncertainty = REPLICATE(0.0004, n_elements(data_x))

; Generated random scatter using the following
; random_scatter = RANDOMU(seed, n_elements(data_x), /DOUBLE, /normal)
; and then fixed to the quoted array for consistency
random_scatter = [0.32558253, -0.55610514, -1.1150768, -1.2337022, -1.2678875, 0.60321692, 1.1025507, 1.5080730, 0.76113001, 0.51978011, 0.72241364, -0.086782108, -0.22698337, 0.22780245, 0.47119014, -2.1660677, -1.2477670, 0.28568456, 0.40292731, 0.077955817, -1.1090623, 0.66895172, -0.59215439, 0.79973968, 1.0603756, 0.82684954, -1.8334587]

original_y = [1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 0.99000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000, 1.0000000]

data_y = original_y + (random_scatter * uncertainty)

Allowing only rp_rs to be fit using MPFIT the least-squares minimizer produces a result of:

Rp/R* = 0.10033294 +/- 0.00066445440 Reduced chi-squared = 0.93970342 Test1

hrwakeford commented 5 years ago

Test 2: Simple dataset with additional scatter and set limb-darkening coefficients and the following set-up

; Constants
; ----------
Gr = 6.67259d-11
dtosec = 86400

; Planet parameters
; ------------------
inclin = 90d0
period = 3.5d0
a_Rs = 7.0d0
rp_rs = 0.1d0

inc_rad = inclin * ((2*!pi)/360D0)
Per = period * dtosec
constant1 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0)
msmpr = (a_Rs/(constant1))^3d0

; Calculate the limb-darkening coefficients using  the 3D LD code
;------------------------------------------------------------------
grating = 'G141'
wsdata = (1.1 + (FINDGEN(100)*0.006)) * 1d4
widek = INDGEN(n_elements(wsdata))
M_H = 0.0
Teff = 5500
logg = 4.5

limb_fit_3D_choose, grating, widek, wsdata, uLD, c1, c2, c3, c4, cp1, cp2, cp3, cp4, aLD, bLD, header, M_H, Teff, logg

c1 =       0.66396105
c2 =      -0.12617095
c3 =      0.053649047
c4 =     -0.026713433

;Set-up the data arrays
;-----------------------
data_x = [-0.046, -0.044, -0.042, -0.040, -0.038, -0.036, -0.034, -0.032, -0.030, -0.006, -0.004, -0.002, 0.0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.032, 0.034, 0.036, 0.038, 0.040, 0.042, 0.044, 0.046,0.048]

uncertainty = REPLICATE(0.0004, n_elements(data_x))

; Again using the fixed random scatter array to generate data_y from the model
;random_scatter = [0.32558253, -0.55610514, -1.1150768, -1.2337022, -1.2678875, 0.60321692, 1.1025507, 1.5080730, 0.76113001, 0.51978011, 0.72241364, -0.086782108, -0.22698337, 0.22780245, 0.47119014, -2.1660677, -1.2477670, 0.28568456, 0.40292731, 0.077955817, -1.1090623, 0.66895172, -0.59215439, 0.79973968, 1.0603756, 0.82684954, -1.8334587]

data_y = [1.0001302, 0.99977756, 0.99955397, 0.99950652, 0.99949285, 1.0002413, 1.0004410, 1.0006032, 1.0003045, 0.98918739, 0.98921560, 0.98886110, 0.98879472, 0.98898693, 0.98911511, 0.98811305, 0.98855772, 0.98927710, 1.0001612, 1.0000312, 0.99955638, 1.0002676, 0.99976314, 1.0003199, 1.0004242, 1.0003307, 0.99926662]

Allowing only rp_rs to be fit using MPFIT the least-squares minimizer produces a result of:

Rp/R* = 0.10030005 +/- 0.00060317730 Reduced chi-squared = 0.93983866

Test2

hrwakeford commented 5 years ago

Test 3: Simple dataset with additional scatter, set limb-darkening coefficients, and a linear slope with the following set-up

; Constants
; ----------
Gr = 6.67259d-11
dtosec = 86400

; Planet parameters
; ------------------
inclin = 90d0
period = 3.5d0
a_Rs = 7.0d0
rp_rs = 0.1d0

inc_rad = inclin * ((2*!pi)/360D0)
Per = period * dtosec
constant1 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0)
msmpr = (a_Rs/(constant1))^3d0

; Limb-darkening coefficients
;------------------------------------------------------------------
c1 =       0.66396105
c2 =      -0.12617095
c3 =      0.053649047
c4 =     -0.026713433

;Set-up the data arrays
;-----------------------
data_x = [-0.046, -0.044, -0.042, -0.040, -0.038, -0.036, -0.034, -0.032, -0.030, -0.006, -0.004, -0.002, 0.0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.032, 0.034, 0.036, 0.038, 0.040, 0.042, 0.044, 0.046,0.048]

uncertainty = REPLICATE(0.0004, n_elements(data_x))

; Again using the fixed random scatter array to generate data_y from the model
;random_scatter = [0.32558253, -0.55610514, -1.1150768, -1.2337022, -1.2678875, 0.60321692, 1.1025507, 1.5080730, 0.76113001, 0.51978011, 0.72241364, -0.086782108, -0.22698337, 0.22780245, 0.47119014, -2.1660677, -1.2477670, 0.28568456, 0.40292731, 0.077955817, -1.1090623, 0.66895172, -0.59215439, 0.79973968, 1.0603756, 0.82684954, -1.8334587]

; Create a slope to also apply to the data
m_fac = 0.04
line = (data_x * m_fac) + 1.00

; data_y = line * (model_data + (random_scatter*uncertainty))
data_y = [0.99929017, 0.99901777, 0.99887430, 0.99890683, 0.99897318, 0.99980124, 1.0000809, 1.0003231, 1.0001044, 0.98993925, 0.99004661, 0.98977091, 0.98978356, 0.99005507, 0.99026251, 0.98933832, 0.98986262, 0.99066208, 1.0024415, 1.0023914, 1.0019954, 1.0027883, 1.0023626, 1.0030008, 1.0031854, 1.0031717, 1.0021845]

EDIT from @ivalaginja: the data_y provided above must be a typo. I can only reproduce the results if I use the line above it, meaning data_y = line * (model_data + (random_scatter*uncertainty)).

Fit for rp_rs and m_fac using MPFIT When fitting for m_fac initialize the fit parameter at 0.0 so that the returned number is not biased by the correct input value

The least-squares minimizer produces a result of:

Rp/R* = 0.10030181 +/- 0.00060348763 m_fac = 0.040151974 +/- 0.0023857033 Reduced chi-squared = 0.97742394 Test3

hrwakeford commented 5 years ago

The simple function used for each of these tests

; TRANSIT FUNCTION
FUNCTION transit_slope, p, X=x, Y=y, ERR=err
Gr = 6.67259D-11

; INPUT PARAMETERS
rl = p(0)
inclin = p(1)
Per = p(2)
msmpr = p(3)
c1 = p(4)
c2 = p(5)
c3 = p(6)
c4 = p(7)
m  = p(8)

b0 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0) * (msmpr^(1D0/3D0)) * [(sin(x*2*!pi))^2D0 + (cos(inclin)*cos(x*2*!pi))^(2D0)]^(0.5D0)

plotquery = 0

occultnl, rl, c1, c2, c3, c4, b0, mulimb0, mulimbf, plotquery, _extra=e

systematic_model = (x*m + 1.0)

;model fit to data = transit model * systematic model
model = mulimb0 * systematic_model

RETURN, (y - model) / err
END
Onoddil commented 5 years ago

Just a quick analytical test to verify the MPFIT uncertainties:

Assuming that , differentiating once with respect to, say $r$, gives , and .

I used this second order differential equation, and assumed .

However, the second-order differential derivation of the uncertainties only works from the log likelihood, not the pure chi-squared, so there's an additional factor of 0.5, which propagates as sqrt(2) into $\sigma_r$. (You also have to be careful with the minus signs from the log-likelihood, being -0.5 \chi^2, and the arbitrary definition of $d_i - m_i$ vs $m_i - d_i$, symmetric in the chi-squared but opposite signs in the differentials; I picked the positive combination such that the square-root is defined.)

For each test above I then took the fit values (and fixed them), and the derived datasets (and/or calculated them in the case of occultnl and added the fixed uncertainty scatter, so do technically introduce a very minor python-IDL computation disagreement, but can only assume those are of the order machine precision), and computed models from the true values (mainly r=0.1, but also m_fac = 0.04). Using these data arrays as d and m above, I calculated the first- and second-order differentials. For tests 0 and 1 these are analytic, as m = 1 - r^2 in transit, and m = 1 on the out of transit portions, and thus my differentials are -2r and -2 respectively (or zero out of transit). For m_fac, the first-order differential becomes data_x * model_data (as the partial derivative is of data_x * m_fac + 1, keeping model_data fixed; the second-order differential is therefore zero. The only non-analytic differentials are of r in test 2 and 3, as the limb darkening has a non-analytic function (or at least I haven't bothered figuring out of it does), so I instead do numerical differences for the differentials, with dr = 1e-7.

My results are then:

test x: chi2, calculated sig
test 0: 0.0 0.0006666666666666665
test 1: 0.9397034403847854 0.0006644544358754432
test 2: 0.9398386755528352 0.0006031823699300192
test 3, r: 0.9774127878157383 0.00060310666359436
test 3, m: 0.9774127878157383 0.0023839586529004106

I therefore believe the MPFIT uncertainties as quoted; these tests look to be good benchmarks against which to test the sherpa implementation.

Edit by Iva, substituted LaTeX equations: (1) $\chi^2 = \sum_i\frac{(d_i - m_i)^2}{\sigma_i^2}$ (2) $\frac{\partial \chi^2}{\partial r} = \sum_i \frac{-2 (d_i - m_i)}{\sigma_i^2} \frac{\partial m_i}{\partial r}$ (3) $\frac{\partial^2 \chi^2}{\partial r^2} = \sum_i \frac{2}{\sigma_i^2}(\frac{\partial m_i}{\partial r})^2 - \frac{2 (d_i - m_i)}{\sigma_i^2} \frac{\partial^2 m_i}{\partial r^2}$ (4) $\sigma_r = 1/\sqrt{\frac{\partial^2 \chi^2}{\partial r^2}}$

ivalaginja commented 5 years ago

This notebook contains all four tests, you can simply do Cell -> Run All: https://github.com/hrwakeford/ExoTiC-ISM/blob/feature/exotic-59_mpfit-comparison/notebooks/Simple%20transit%20model%20in%20Sherpa.ipynb

The results of each test are labelled Final results and plot test [no]

Already the first fit result of test 0 alone suggests something is simply wrong in the model (see below). Ingress and egress are not centered between the data batches inside and outside of transit, but rather squished together around the transit floor.

test0

This is majorly wrong, so it will be the next thing I will look into. Happy about any hints, or if you've seen something like this before @hrwakeford.

ivalaginja commented 5 years ago

@hrwakeford I am not sure which part I am plotting wrong and which part just comes from the model implementation. As far as I can see the model (and hence the fit) only spit out fitted data points at exactly the same phase values like the input data, which is not regularly sampled but has three concentrated groups of data points. If I then try to plot the fit, there is no reason why it would interpolate the model between these groups of data points, so my results above make sense of some sort, even if it is not a good representation of the model fit.

Did you simply overplot an interpolated model in your results or is there something in your fitting that actually interpolates the x-array during the fit? (I guess it's the former, since we're using the same thing essentially, except that mine is a translation of your IDL code into python.)

Evaluating the model fit on a denser grid gives me a similar result:

Screen Shot 2019-09-30 at 21 34 49
ivalaginja commented 5 years ago

I have pushed the notebook I made for this with the outputs this time, to make things easier, you can find it here: https://github.com/hrwakeford/ExoTiC-ISM/blob/feature/exotic-59_mpfit-comparison/notebooks/Simple%20transit%20model%20in%20Sherpa.ipynb

In terms of fit results, assuming that the model plots shown above are smooth models of the fitted data, these are the results in python, with Sherpa, when doing the same thing: one_slide

In tests 1 and 2, we slightly overestimate the transit depth with respect to the IDL results, but also get a slightly smaller error (and larger reduced chi-squared), but test 3 completely failed here. For some reason, the model after the fit actually adapts the slope as opposed to taking it into account in the systematic model. Also, if you look closely, even if I directly copied the data_y array you used to do this, my data points in test 3, the ones in the far left group, are slightly higher than yours. Yours are all consistently below a normalized flux of 1 while mine are a bit higher than that. We should double check that we use the same fake flux data for this.

@hrwakeford feel free to look through the notebook I made (if you don't want to manipulate it, you can just open it in GitHub, link provided above), please let me know of any thoughts you have.

hrwakeford commented 5 years ago

I think with Test 3 it is clear that the incorporation of a systematic model is throwing Sherpa somehow.

Here is the array for the line to see if there is a problem there

line = [0.998160, 0.998240, 0.998320, 0.998400, 0.998480, 0.998560, 0.998640, 0.998720, 0.998800, 0.999760, 0.999840, 0.999920, 1.00000, 1.00008, 1.00016, 1.00024, 1.00032, 1.00040, 1.00128, 1.00136, 1.00144, 1.00152, 1.00160, 1.00168, 1.00176, 1.00184, 1.00192]
hrwakeford commented 5 years ago

Test 4 To see if fitting for the center of transit time is an issue.

; Constants
; ----------
Gr = 6.67259d-11
dtosec = 86400

; Planet parameters
; ------------------
inclin = 90d0
period = 3.5d0
a_Rs = 7.0d0
rp_rs = 0.1d0

inc_rad = inclin * ((2*!pi)/360D0)
Per = period * dtosec
constant1 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0)
msmpr = (a_Rs/(constant1))^3d0

; Calculate the limb-darkening coefficients using  the 3D LD code
;------------------------------------------------------------------
grating = 'G141'
wsdata = (1.1 + (FINDGEN(100)*0.006)) * 1d4
widek = INDGEN(n_elements(wsdata))
M_H = 0.0
Teff = 5500
logg = 4.5

limb_fit_3D_choose, grating, widek, wsdata, uLD, c1, c2, c3, c4, cp1, cp2, cp3, cp4, aLD, bLD, header, M_H, Teff, logg

c1 =       0.66396105
c2 =      -0.12617095
c3 =      0.053649047
c4 =     -0.026713433

;Set-up the data arrays
;-----------------------
data_x = [-0.046, -0.044, -0.042, -0.040, -0.038, -0.036, -0.034, -0.032, -0.030, -0.006, -0.004, -0.002, 0.0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.032, 0.034, 0.036, 0.038, 0.040, 0.042, 0.044, 0.046,0.048]

uncertainty = REPLICATE(0.0004, n_elements(data_x))

; Again using the fixed random scatter array to generate data_y from the model
;random_scatter = [0.32558253, -0.55610514, -1.1150768, -1.2337022, -1.2678875, 0.60321692, 1.1025507, 1.5080730, 0.76113001, 0.51978011, 0.72241364, -0.086782108, -0.22698337, 0.22780245, 0.47119014, -2.1660677, -1.2477670, 0.28568456, 0.40292731, 0.077955817, -1.1090623, 0.66895172, -0.59215439, 0.79973968, 1.0603756, 0.82684954, -1.8334587]

; Create a slope to also apply to the data
m_fac = 0.04
line = (data_x * m_fac) + 1.00

; data_y = line * (model_data + (random_scatter*uncertainty))
data_y = [0.99929017, 0.99901777, 0.99887430, 0.99890683, 0.99897318, 0.99980124, 1.0000809, 1.0003231, 1.0001044, 0.98993925, 0.99004661, 0.98977091, 0.98978356, 0.99005507, 0.99026251, 0.98933832, 0.98986262, 0.99066208, 1.0024415, 1.0023914, 1.0019954, 1.0027883, 1.0023626, 1.0030008, 1.0031854, 1.0031717, 1.0021845]

We now add in another parameter to fit.

e_shift = -0.01

This only appears in the model itself as a shift parameter to the x-array to test if it can settle back to the preferred solution of e_shift = 0.0 which is what the data we input shows. The 1st figure below shows what the starting value of e_shift would look like if it was right.

So, within the function you have introduced this additional e_shift parameter which we want in this case to fit back to zero:


; TRANSIT FUNCTION
FUNCTION transit_slope, p, X=x, Y=y, ERR=err
Gr = 6.67259D-11

rl = p(0)
inclin = p(1)
Per = p(2)
msmpr = p(3)
c1 = p(4)
c2 = p(5)
c3 = p(6)
c4 = p(7)
m  = p(8)
e_shift = p(9)

x = x + e_shift

b0 = (Gr*Per*Per/(4*!pi*!pi))^(1D0/3D0) * (msmpr^(1D0/3D0)) * [(sin(x*2*!pi))^2D0 + (cos(inclin)*cos(x*2*!pi))^(2D0)]^(0.5D0)

plotquery = 0

occultnl, rl, c1, c2, c3, c4, b0, mulimb0, mulimbf, plotquery, _extra=e

systematic_model = (x*m + 1.0)

;model fit to data = transit model * baseline flux * systematic model
model = mulimb0 * systematic_model

print, 'Rp/R* = ', p(0)

; plot, x, y, /ystyle, psym=4, yrange=[min(y)-0.0001,max(y)+0.0001]
; oplot, x, model, psym=-2, color=1000
; oplot, x, mulimb0

resids = (y - model) / 1.0

RETURN, (y - model) / err
END

Fit for rp_rs, m_fac, and e_shift using MPFIT When fitting for m_fac initialize the fit parameter at 0.0 so that the returned number is not biased by the correct input value When fitting for e_shift initialize it with -0.01 to shift the data and we want MPFIT to find 0.0

The least-squares minimizer produces a result of:

Rp/R* = 0.10007587 +/- 0.00074059693 m_fac = 0.040194558 +/- 0.0023884482 e_shift = -0.0010422871 +/- 0.0020931500 Reduced chi-squared = 1.0080670

What the input e_shift for TEST 4 would look like if it was applied to the x_array. However, the input x_array will be the same as before without the shift. This will test if the code can fit and remove the artificial shift. Input_test4

OUTPUT TEST 4: it has now fit for the correct e_shift of essentially 0.0, and the data look like the input array we wanted (see Figure in https://github.com/hrwakeford/ExoTiC-ISM/issues/59#issuecomment-533665365) output_test4

ivalaginja commented 5 years ago

I added a comment there directly, but the data_y array provided for the IDL Test 3 in this comment above is incorrect, I proceeded with the correct ones as indicated in the edited comment and now the data in the IDL and Python notebook are the same.

This does NOT fix our overall issue yet, but this was bugging me.

ivalaginja commented 5 years ago

I have found the reason for the problems we've had above: The structural implementation of the fitting code is slightly different between mpfit and sherpa (this has nothing to do with the statistics they use, just how the code is set up).

When we pass the model function into mpfit, we pass it a function that returns weighted deviations between the model and the data. After mpfit does its thing and return the best fit parameters, we then use a different function to actually calculate the model. See:

class mpfit:
   def __init__(self, fcn, xall=None, functkw={}, parinfo=None,
                ftol=1.e-10, xtol=1.e-10, gtol=1.e-10,
                damp=0., maxiter=200, factor=100., nprint=1,
                iterfunct='default', iterkw={}, nocovar=0,
                fastnorm=0, rescale=0, autoderivative=1, quiet=0,
                diag=None, epsfcn=None, debug=0):
      """
Inputs:
  fcn:
     The function to be minimized.  The function should return the weighted
     deviations between the model and the data, as described above.
[...]

and

def transit_circle(p, fjac=None, x=None, y=None, err=None, sh=None):
    [...]
    mulimb0, mulimbf = occultnl(rl, c1, c2, c3, c4, b0)
    systematic_model = sys_model(phase, HSTphase, sh, m_fac, hstp1, hstp2, hstp3, hstp4,
                                 xshift1, xshift2, xshift3, xshift4)
    model = mulimb0 * p[1] * systematic_model    #NOTE: p[1] = flux0 from snippet below
    return [0, (y - model) / err]

where occultnl() is the actual transit model.

For sherpa, we a the function that simply returns the model directly, not its deviations from the data. We then use the same function to calculate the model with the parameters returned from the fit. And since this function includes the systematic model, the newly calculated model includes the systematics, which we actually don't want in there. See:

def _transit_model(pars, x, sh, x_in_phase=False):
    [...]
    mulimb0, mulimbf = occultnl(rl, c1, c2, c3, c4, b0)
    systematic_model = sys_model(phase, HSTphase, sh, m_fac, hstp1, hstp2, hstp3, hstp4,
                                 xshift1, xshift2, xshift3, xshift4)

    model = mulimb0 * flux0 * systematic_model     #NOTE: flux0 = p[1] from snippet above
    return model

The solution will be to separate the function we use for the fit (transit model + systematic model) from the function we use to calculate the final transit model from the fit (transit model only). This seems fairly easy conceptually, however I am still looking into how to do this.

I had already opened issue #10 to separate the transit model from the systematic model, although back then it seemed like that would just be cleanup. It might turn out that this is actually needed to make everything work properly.

ivalaginja commented 5 years ago

Before getting into changing the model, I plotted the fit results in the notebook in the same way we do in the big script, which yields this:

fixed_plotting

Here's a comparison to the according IDL result from this comment further above:

Rp/R* = 0.10030181 +/- 0.00060348763
m_fac = 0.040151974 +/- 0.0023857033
Reduced chi-squared = 0.97742394

Test3

This indicates that we can plot the fit result all right without including the systematic model, but for some reason the transit depth is not fit properly. Hunting the reason for that now.

ivalaginja commented 4 years ago

I think I've got significantly closer to fixing this. I compared the IDL and Python codes of the examples above line-by-line; I saw that for this simple test case, @hrwakeford had simplified the systematic model to:

systematic_model = (x * m_fac + 1.0)
model = mulimb0 * systematic_model

while I was still using the full function in python, assuming this would be fine since all other systematics parameters are supposed to be set to zero:

systematic_model = sys_model(phase, HSTphase, sh, m_fac, hstp1, hstp2, hstp3, hstp4, xshift1, xshift2, xshift3, xshift4)
model = mulimb0 * flux0 * systematic_model

If I implement the systematic model in the same simple way like in the IDL test script, I get the correct result:

Screen Shot 2019-11-16 at 01 15 04

This is excellent news, since it means that there is simply an error somewhere in the python implementation of the systematic model or its combination with the "pure" transit model.

ivalaginja commented 4 years ago

Turns out it is where we combine the transit model with the systematics.

When I use this line:

model = mulimb0 * systematic_model

everything is fine, but when I use this line:

model = mulimb0 * flux0 * systematic_model

that's when there transit depth gets underestimated.

@hrwakeford I can see that you specifically changed it from the second (which is what we use in our main script) to the first line. Why is that and why is using the first line fine here, in these simple plots, but we need the second line in our actual code?

Edit: Especially since these would be super easy tests to set up if we could just use the standard fitting and model functions for them.

hrwakeford commented 4 years ago

@ivalaginja Have you tested the IDL test code with the flux0 put back into the model? You would need to set it as one of the input parameters and I suggest you set the initial value to the fist data point or something similar. That may reveal if it is something in the functional set up or in the test set-up.

hrwakeford commented 4 years ago

I have tested incorporating the flux0 into the IDL tests to see if it has the same effect. It is however fitting perfectly with what we would expect.

Attached is the IDL test code set with test 5 marking the inclusion of flux0 as a free parameter

lightcurve_tests.txt

Onoddil commented 4 years ago

In the notebook Simple transit model in Sherpa you linked here @ivalaginja you initialise Test 4 with the cell

# Freeze almost all parameters
# Note how m_fac stays thawed
model4.flux0.freeze()
model4.epoch.freeze()
...
model4.xshift4.freeze()

print(model4)

but should flux0 not be a free parameter here? Is the issue not (hopefully) simply that the model is trying to keep the baseline flux at data_y[0] which requires the decrease in the transit depth we see? It doesn't matter in Tests 0-2 as they all have a first y-value flux of basically 1 -- but it also looks to me like Test 3 underestimates the transit depth at r=0.923 also here, which could also be explained with the removal of model3.flux0.freeze() in its corresponding cell in the notebook.

If flux0 should always be fit could it also get a alwaysfrozen=True flag, like the limb darkening coefficients?

ivalaginja commented 4 years ago

Oh!! Thanks to both of you! Was this something I missed while porting to Python?

Thawing the baseline flux parameter indeed makes this fit work while flux0 is in the model as well:

Screen Shot 2019-11-20 at 11 51 59

It also underestimates rl slightly less with that (rl = 0.095557), although not all that much.

hrwakeford commented 4 years ago

Just to be clear the flux0 parameter should always be "thawed" not frozen. The LD parameters are, however, as stated above always frozen, but flux0 like rl should both be flagged as always thawed.

ivalaginja commented 4 years ago

So. To conclude this overall:

There are and, we believe, always will be differences between the fitting implementation Hannah used to have in IDL and the new one in Python with sherpa. They seemed to provide inconsistencies too large to accept as a full transition initially, however, we managed to convince ourselves that this is due to the slightly different implementation of bounding between the two fitting frameworks.

After testing this on an independent data set we had no results on previously, and running the marginalization on five different wavelength bins, we recover exactly the same transmission spectrum (transit depth vs. center wavelength) and the Sherpa implementation yields error margins that are more realistic than the IDL results. We still observe a constant shift along the y-axis in the results, but the science is captured in the shape of the spectrum itself so we call this issue here that deals with the transition from one to the other closed.