KeplerGO / pyke

Easy command line tools for Kepler, K2 & TESS data analysis.
http://pyke.keplerscience.org
118 stars 35 forks source link

Add a LightCurve.cdpp() method #160

Closed barentsen closed 6 years ago

barentsen commented 6 years ago

I made an attempt to implement a LightCurve.cdpp() method. I also found myself adding LightCurve.remove_outliers() and LightCurve.remove_nans() as part of the implementation.

This is going to need unit tests and a critical review!

codecov[bot] commented 6 years ago

Codecov Report

Merging #160 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #160   +/-   ##
=======================================
  Coverage   33.54%   33.54%           
=======================================
  Files          65       65           
  Lines       10409    10409           
=======================================
  Hits         3492     3492           
  Misses       6917     6917

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 25b4991...6007099. Read the comment docs.

mirca commented 6 years ago

This looks great! Let's add unit tests!

barentsen commented 6 years ago

To match Jeff van Cleve's (JVC) implementation, I believe the fluxes need to be gap-filled and mean-divided prior to computing cdpp. This is still to be added to this implementation.

I also asked JVC the following two questions:

1) Your method includes a noise normalization factor (1.40). I would like to document where this number comes from, and how it changes as a function of the parameters. (I'm really just looking for one or two sentences to demystify this number.)

2) Your implementation uses a running mean (i.e. using a sliding window / convolution). I've seen others use a "block-wise mean" such that each flux data point only enters one mean to create a "binned time series". I was wondering if you had any understanding as to whether the difference matters.

barentsen commented 6 years ago

I changed the implementation to use the np.cumsum-based running mean, for which I have added the utils.running_mean() function.

I also added unit tests which verify that cdpp() isn't returning nonsense.

It's not 100% clear yet that this is the exact way we should compute cdpp, but I am happy for it to be merged so we can start using it on real data.

mirca commented 6 years ago

👍