Closed exowanderer closed 4 years ago
In a directly related problem, I am able to compute the wavelet likelihood given the model_prediction and data in np.array
format, to later subtract as the residuals.
The function call is
dwt_chisq(model_prediction, data, [gamma, sigma_r, sigma_w])
To run this with xo
via PyMC
it places model_prediction
as a tensor (i.e. theano.tensor.var.TensorVariable
); also, gamma
, sigma_r
, and sigma_w
are each of type theano.tensor.var.TensorVariable
.
In order to run the dwt_chisq
as is, I would need to output a numpy array from each of those tensors. What is the appropriate method to do so?
I tried .eval()
; but that had very little useful outputs. Please let me know if I'm off base, such as "it's not possible to output a np.array
type" or "that's now how you use .eval()
"
Thank you!
Hi Jonathan,
The catch here is that you need to provide the wavelet likelihood as a Theano op and you need to be able to compute the gradient of this operation with respect to its inputs. Here's how I worked out the gradients for celerite and I bet something similar would be possible (and maybe not even too onerous!) for the C&W method.
But, that being said, I wouldn't bother! You can use celerite and it'll be just as fast and have fewer restrictions (it can model more complicated power spectra and it doesn't require evenly spaced data). If your model is something like:
log_s2 = pm.Normal(...)
log_Sw4 = pm.Normal(...)
log_w0 = pm.Normal(...)
kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=1/np.sqrt(2))
gp = xo.gp.GP(kernel, x, yerr ** 2 + pm.math.exp(log_s2))
then log_s2
is the variance of the white noise component and log_Sw4
will be related to the "red noise" variance that you want. Honestly it's not obvious that the sigma_r parameter is a very meaningful number since its value will depend sensitively on the specific choice of power spectrum model, but it should be possible to calibrate a relationship between that number and the parameters of a celerite model if you want. Take a look at Section 4 of the celerite paper to see more about the model specification there.
Hope this helps!
@exowanderer
To add to what @dfm said, the wavelet model from the Carter & Winn paper solves for the correlated noise component assuming a specific power-spectrum: f^{-1}
, aka pink noise. (Their paper in principle allows for other power-laws, f^{-\nu}
, but the fast scaling of the algorithm only works for \nu=1
, if I remember correctly).
This power-spectrum is not a great description of stellar variability. It has a scaling at high frequency which causes sharper variations in the correlated noise component that does a stochastically-driven simple damped harmonic oscillator. It is also even stronger at high frequency than a damped random walk (aka Ornstein-Uhlenbeck process). In practice, however, what I expect this means is that the short-timescale variability will be conflated with the white noise component of variability.
Also, at low frequencies f^{-1}
is too large - in fact, it diverges, which means that there is power on long timescales. In practice this may not be too much of an issue since any dataset has a finite duration. So, I think f^{-1}
worked well for Carter & Winn thanks to these saving factors.
That said, there may be ranges of frequency where an f^{-1}
slope is valid, and that can be approximated well with, e.g., three Q=1/2
celerite terms, as in this plot:
Comparing these, you can see that the pink noise continues to rise at low frequencies, and dominates at high frequencies. But, as I mentioned above, a white-noise component will always dominate at the highest frequencies. So, as Dan mentioned, you can approximate this wavelet spectrum over some dynamic range of the power spectrum with the celerite GP.
@ericagol: Do you mean Q=1/sqrt(2) rather than 1/2?
I used Q=1/2 for this plot. Here's the Julia code:
using PyPlot
clf()
nu = 10.0.^(collect(-1:0.01:1.5))
s(s0,nu0,nu) = s0/((nu/nu0)^2+1)^2
s1 = s.(1.0,1.0,nu)
s2 = s.(0.25,4.0,nu)
s3 = s.(0.0625,16.0,nu)
loglog(nu, s1 .+ s2 .+ s3, label=L"3 Q=$1/2$ terms")
loglog(nu, 0.52 ./nu, label = L"Pink: $f^{-1}$")
loglog(nu, s1, alpha=0.3,color="C0")
loglog(nu, s2, alpha=0.3,color="C0")
loglog(nu, s3, alpha=0.3,color="C0")
legend()
axis([0.1,30.0,1e-2,10.0])
xlabel(L"$\omega$")
ylabel(L"$S(\omega)$")
@Ericagol: there will be numerical issues for Q=1/2 so I generally recommend Q=1/sqrt(2) or something else that's not exactly 1/2.
Right, that's the Matern kernel. Here's a case with Q=1/3 which looks even better.
using PyPlot
clf()
nu = 10.0.^(collect(-1:0.01:1.5))
s(s0,nu0,nu,Q) = sqrt(2/pi)*s0*nu0^4/((nu^2-nu0^2)^2+(nu*nu0)^2/Q^2)
#loglog(nu,1.0./(nu.^2 .+1).^2 .+ 0.25 ./ ((nu./4).^2 .+1).^2 .+ 0.0625 ./ ((nu ./16).^2 .+ 1).^2)
s1 = s.(1.4,0.8,nu,0.33)
s2 = s.(0.25,4.0,nu,0.33)
s3 = s.(0.0725,20.0,nu,0.33)
loglog(nu, s1 .+ s2 .+ s3, label=L"3 Q=$1/3$ terms")
loglog(nu, 0.27 ./nu, label = L"Pink: $f^{-1}$")
loglog(nu, s1, alpha=0.3,color="C0")
loglog(nu, s2, alpha=0.3,color="C0")
loglog(nu, s3, alpha=0.3,color="C0")
legend()
axis([0.1,30.0,1e-2,10.0])
xlabel(L"$\omega$")
ylabel(L"$S(\omega)$")
log_s2 = pm.Normal(...) log_Sw4 = pm.Normal(...) log_w0 = pm.Normal(...) kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=1/np.sqrt(2)) gp = xo.gp.GP(kernel, x, yerr ** 2 + pm.math.exp(log_s2))
then
log_s2
is the variance of the white noise component andlog_Sw4
will be related to the "red noise" variance that you want.
You are both great! Thank you for the discussion; and, especially, the code segments. I'm a big fan of GPs. I just have a lot of problems picking out the 'best' kernel+hyperparameter combos, for a given problem set.
For instance, I read many sections of the the celerite paper; and, we had talked in person about using the SHO kernel in replacement for the CW-Wavelets. My issue is that I find it difficult to know which SHO hyperparameters to set. I would easily have let Q float, instead of setting it to 1/3 or 1/sqrt(2). I still don't fully know what prior bounds to place on the log_s2
, log_sw4
, log_w0
terms; but I'll probably model it after these examles: example1, example2.
Side question: what did you (both) mean, when you said "Q=1/2 is the Matern kernel". Related to the above question, I was already trying out the Matern kernel, because I thought that it was the appropriate method to characterize "1/f" noise.
This discussion has been a big help. I already cited the exoplanet+celerite paper; I think that I'll also add a footnote to this discussion.
As a secondary conversation: when collaborating on an observation paper, the conversation often starts with 'we need to measure the red noise'; but then it also includes 'GPs are too complicated/versatile'. I have personally said, and heard from others, phrases such as "GPs may eat the transit/eclipse"; i.e. when fitting a GP freely, they can 'model' the transit/eclipse as a correlated noise source.
In the framework of academic inertia, may I suggest that we could make small examples (i.e. notebooks) that show how to use exoplanet+celerite to model previously established norms in our field.
In the celerite paper [page 34], you discuss CW-wavelets, their limitations, and how to approximate them using celerite
with S(w) ~ w^(-1). I didn't know what that would look like in code format until today.
I can also make a notebook comparing CW-Wavelets to Celerite+SHO as well. I'll try to make that soon -- i.e., this semester -- then PR it for you to include if you wish. I think that our community would find it helpful to know where to use celerite
as an improvement (or just replacement) for the existing methods.
Thank you both; and the team for this great work!
Combining example1 and example2, here is what I came up with:
def build_gp_pink_noise(time, data, dataerr, Q=1.0 / np.sqrt(2)):
log_S0 = pm.Normal("log_S0", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
log_w0 = pm.Normal("log_w0", mu=0.0, sigma=15.0,
testval=np.log(3.0))
log_sw4 = pm.Deterministic("log_variance_r", log_S0 + 4 * log_w0)
log_s2 = = pm.Normal("log_variance_w", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
kernel = xo.gp.terms.SHOTerm(log_Sw4=log_Sw4, log_w0=log_w0, log_Q=np.log(Q))
gp = xo.gp.GP(kernel, time, dataerr ** 2 + pm.math.exp(log_s2))
return pink_gp
such that my script, which also creates PyMC3 priors for eclipse_depth, mean, etc inside:
with pm.Model() as model:
mean = pm.Normal(f"mean", mu=1.0, sd=1.0)
edepth = pm.Uniform("edepth", lower=0, upper=0.01)
sqrt_edepth = pm.math.sqrt(edepth)
# Compute the model light curve using starry
# Set up a Keplerian orbit for the planets
light_curve = star.get_light_curve(orbit=orbit, r=sqrt_edepth, t=times)
# The likelihood function assuming known Gaussian uncertainty
pm.Normal(f"obs", mu=light_curve, sd=dataerr, observed=data)
# Build GP model
gp = build_gp_pink_noise(time, data, dataerr, Q=1.0 / np.sqrt(2))
map_soln = xo.optimize(start=model.test_point)
trace = pm.sample(
tune=3000,
draws=3000,
start=map_soln,
chains=mp.cpu_count(),
step=xo.get_dense_nuts_step(target_accept=0.9),
cores=mp.cpu_count()
)
[New Question]: In my example above, are log_s2
and log_S0
the exact same thing? or do I need both variables? Should I (effectively) set log_s2 = log_S0
and log_SW4 = log_S0 + 4 * log_w0
?
No - log_s2 is the white noise variance. There's no reason why this should be related to the GP amplitude!
I created a synthetic data set with pink noise, and then modeled that data with exoplanet
+ celerite
.
The full example, which runs from end-to-end for me, can be found here
The important part for this discussion is included here
def build_gp_pink_noise(times, data, dataerr,
log_Q=np.log(1.0 / np.sqrt(2))):
''' Build a pink noise GP iterating over S0, w0, Sw4
times (nDarray): time stamps for x-array
data (nDarray): normalized flux for transit light curve
dataerr (nDarray): uncertainty values per data point
log_Q (float): hyperparameter for SHO kernel [Q=1/sqrt(2): pink noise]
'''
log_w0 = pm.Normal("log_w0", mu=0.0, sigma=15.0,
testval=np.log(3.0))
log_Sw4 = pm.Normal("log_variance_r", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
log_s2 = pm.Normal("log_variance_w", mu=0.0, sigma=15.0,
testval=np.log(np.var(data)))
kernel = xo.gp.terms.SHOTerm(
log_Sw4=log_Sw4, log_w0=log_w0, log_Q=log_Q)
return xo.gp.GP(kernel, times, dataerr**2 + pm.math.exp(log_s2))
I was able to run this model and converge to a meaningful solution; but I have 3 questions about this.
Is this model what you were suggesting?
The results that I am getting are not within my expectations:
log_s2
is ~ -30 (very very small)
log_Sw4
is ~ -5; which is small, but always much larger log_s2
.
I was expecting the log_s2 ~ -10
(i.e. 50 ppm) and the log_Sw4 ~ -12
(i.e. 5 ppm). I think that I set the pink scatter to be 10% of the white scatter (see example, lines 186 + 207:
with sigma_r=0.1
Would you happen to know why I am finding a much larger log_Sw4
than log_S2
?
I assume that I screwed something up.
How does the GP method interface with an AICc / BIC calculation?
I fit 32 models (other than GPs) with a range of AICc / BIC values from 150-350. When I add the GPs, should I assume 3 extra parameters? I am finding improved logp (within 10%). The AICc and BIC both increase by ~4% (more than enough to reject the GP models).
But I don't know if that is as direct a calculation as I think it is. And of course, I could always have simply coded something wrong to begin with, GP-wise.
There's a completely unrelated problem that I've been encountering. I can enter either the inclination, impact parameters, or transit duration; but only one a time. [this is built in; no problem]
The problem arises that I've tried this exact same code on 3 machines, and ~10 times per machine. Depending on the machine and time of day (i.e. randomness), the KeplerianOrbit object does not like the form of one or two values in the set [inclination, impact parameter, duration] that I give it. Note that I give it the exact same values every time.
Every 3rd - 5th attempt results in an error that says something like "inclination is a bad value" (I'll edit that when I find it next). So I change to using duration or impact parameter. That will work for 3 - 5 more attempts, and then fail. At which point, I rotate through the list and it works again.
On the colab link, if you find an error related to [inc, b, or dur], then please goto the Cell 4
and change the inc
for the dur
or b
as needed.
I should probably make this its own issue
That second comment should be a separate issue - feel free to open one!
There are a few issues with the first one:
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=light_curve, sd=dataerr, observed=data)
gp = build_gp_pink_noise(times, data, dataerr, log_Q=log_Q)
gp.marginal("gp", observed=data)
The first one is the likelihood assuming white noise (with the quoted error bars) and the second one is assuming both white noise and a GP. The first one shouldn't be there at all!
observed=data - light_curve
) or you could provide a mean function to the GP:def lc_model(t):
light_curves = xo.LimbDarkLightCurve(u).get_light_curve(
orbit=orbit, r=r, t=t
)
return pm.math.sum(light_curves, axis=-1) + mean
gp = xo.gp.GP(..., mean=lc_model)
2 * integral(kernel(dt), {dt, 0, infty})
Which, I think, for an exoplanet celerite kernel will be:
ar, cr, ac, bc, cc, dc = kernel.coefficients
variance = 2 * tt.sum(a / c)
variance += 2 * tt.sum(ac * cc / (cc ** 2 + dc ** 2))
variance += 2 * tt.sum(bc * dc / (cc ** 2 + dc ** 2))
but use this with a grain of salt for now and double check the math!
My understanding from Carter & Winn is that pink noise has a 1/f
power spectrum. However, this power spectrum isn't normalizable, so the ACF doesn't exist! In practice I think it is defined over some range of frequencies, which might correspond to the inverse exposure time and total duration of the data. Then, the ACF is defined in terms of the difference of inverse cosine integrals (which I gathered from some googling).
Is your feature request related to a problem? Please describe. I'm using
exoplanet
to integrate PyMC3 with efficient transit modeling. I have thebase
system running smoothly -- thanks to great tutorials. Thank you.I was requested by a collaborator to include the Carter & Winn (2009) Daubechies wavelet likelihood.
(1) I am able to generate log_likelihood as a float value. So that first
feature request
that I would ask is how to add that into the xo.optimize or pm.sample calls?(2) If we can bypass wavelets altogether by using a GP to estimate
sigma_r
andsigma_w
efficiently, then I would be more than happy to do that. We've previously discussed using a GP to model the Daubechies wavelet likelihood; but I would need for help in selecting the GP+Kernels combinations to do it.As a secondary feature request, I would ask for a tutorial to estimate the residual red noise in a light curve the way that our community to put in tables.
Describe the solution you'd like I see from #70, that adding in a likelihood function is feasible; but I get the impression that there is more to do than a simple
xo_logp + wavelet_logp
.Is there a way to take a function (with 2 inputs + 3 PyM3 parameters) and "add" it to the likelihood computed inside exoplanet?
Describe alternatives you've considered I grabbed the necessary bits of code form @pcubillos's package
mc3
https://github.com/pcubillos/mc3/blob/master/mc3/stats/stats.py
[line 209
]I can compute the wavelet likelihood (outside of PyMC3+XO) as a float value; but how to wrap this into something that
xo
can interpret as an addition to the log-likelihood built in?Additional context If there is a much simpler way to use the GP module to estimate
sigma_r
andsigma_w
, then I could bypass this whole issue.At the same time, others may still want a wavelet likelihood added onto their light curve posteriors. As such, may I suggest this feature overall.