KeplerGO / pyke

Easy command line tools for Kepler, K2 & TESS data analysis.
http://pyke.keplerscience.org
118 stars 35 forks source link

Add normalization parameter back to `LightCurve.cdpp()` #171

Closed barentsen closed 6 years ago

barentsen commented 6 years ago

This PR adds the normalization constant argument back to the cdpp method.

The CDPP implementation from JVC suggests the savgol-based CDPP should be multiplied by 1.4, however I am finding that it should be closer to 1.19:

python -c "import numpy as np; from pyke import LightCurve; print(1/np.sqrt(13)/LightCurve(range(1000000), 1+np.random.normal(loc=0, scale=1e-6, size=1000000)).cdpp(norm_factor=1))"
1.19047139601

More work is required to understand the normalization factor. For now, this PR will make the unit test fail because PyKE's cdpp does not quite match the pipeline's. (But then again the unit test compares the values for Tabby's star, we should really add a quiet star to the unit tests for this.)

dacmess commented 6 years ago

What is the purpose of the normalization constant? Is it an empirical value that is meant to make the SG noise estimate closer to pipeline CDPP? If so then the I suspect the normalization depends strongly on the non-white noise nature of individual light curves. I'd recommend scrapping it (norm_factor=1) and making this metric stand on its own.

barentsen commented 6 years ago

@dacmess JVC's code suggests the normalization factor is necessary to make the CDPP scale as 1/sqrt(transit_window), but I don't really understand. Let's discuss with JVC when he's at Ames!

Let's not merge this pull request until then... and indeed, maybe the Right Answer (tm) is to avoid normalization altogether.

barentsen commented 6 years ago

I am going to close this PR for now -- not adding a cdpp normalization factor.

mirca commented 6 years ago

I remember with this factor we got a closer estimate as compared to k2sc?

barentsen commented 6 years ago

The comparison with k2sc was for one particular target, so I'm not sure that's true in general.

I believe the purpose of the constant is to ensure that white noise with a certain amount of scatter yields a cdpp that matches the true scatter (as opposed to it being attenuated). I added the following unit test in test_lightcurve.py which verifies that this is at least approximately true:

    lc = LightCurve(np.arange(10000), np.random.normal(loc=1, scale=100e-6, size=10000))
    assert_almost_equal(lc.cdpp(transit_duration=1), 100, decimal=-0.5)

We could dig deeper into this, but I consider this a low-priority research project which we shouldn't work on until the cdpp's reported by lightkurve would start to matter in real publications or detailed pipeline comparisons.

mirca commented 6 years ago

👍