An approach for optimizing the detection of small transiting planets in future transiting exoplanet surveys (Paper; arXiv:2308.04282)
Motivation for this work\ A major goal of space-based exoplanet transit survey missions (Kepler, K2, TESS, Plato, Roman) is to detect small Earth-sized planets that produce very faint (~0.01%) periodic dips in stellar brightness. This requires an extremely effective reduction of aperiodic stellar and instrumental variations and a very sensitive periodogram applied to the detrended data. Our recent paper (Gondhalekar, Feigelson, Montalto, Caceres & Saha, 2023, submitted to the AAS Journals) provides a detailed analysis of two statistical approaches to this problem. We find that ARIMA detrending and the Transit Comb Filter periodogram is substantially more effective for small planet detection than commonly used detrenders (e.g., splines, Gaussian Processes regression) followed by Box Least Squares periodogram. This AutoRegressive Planet Search (ARPS) approach, developed by Caceres et al. 2019a, has been used to find small transiting planets from the 4-year Kepler dataset (Caceres et al. 2019b) and the Year 1 TESS Full Field Image dataset (Melton et al. 2023a,b,c).
Methodology and coding\ This GitHub repository contains R code for comparing the BLS and TCF transit periodogram algorithms. It can be extended for comparing any set of periodograms. It uses extreme value theory and/or signal-to-noise ratio (SNR) for comparison. The R language (with computationally intensive periodogram computations in Fortran) is preferred over Python because of its comprehensive packages for time series analysis and other statistical methodologies.
Several detrending approaches have been used in the past for detrending stellar light curves prior to the search for transiting exoplanets: Gaussian Processes Regression or local regression methods such as Splines and LOWESS. These approaches do an excellent job of removing (generally systematic but can also be stellar) long-term trends from the light curve. The class of parametric AutoRegressive Moving Average (ARMA) models is effective in removing (generally stellar) short-term trends. One well-known issue is that detrending methods often alter the planet's depth since, in its naive usage, the approach has no information about whether and where the transit is present (transit masking, for example, can be used to resolve this). Hence, ARMA used before BLS was not helpful since ARMA advertently modeled the planetary signal.
Our study observed that the ARMA model is beneficial if fitted on a differenced light curve. Once the light curve is differenced, BLS can no longer be used since the box-shaped transit is changed to a double spike. Hence, one can use the TCF periodogram after differencing + ARMA modeling. The resulting pipeline has proven more beneficial than Gaussian Processes Regression/Splines + BLS for detecting small planets. The TCF periodogram also possesses better properties (noise characteristics, peak width) than BLS.
Please see our paper for more details.
BLS
and TCF3.0
folders contain the underlying implementation of these algorithms.
eebls.f
is used). The folder BLS
also contains bls.R
, which is the R interface for the Fortran code. We recommend users use this script for running BLS.TCF3.0
folder contains the TCF implementation. The lower-level implementation is again in Fortran (Fortran was used due to its swift computation). The tcf.f95
code is the cleaned version after consulting with the original code author, Gabriel A. Caceres. We recommend using this version of TCF for better compatibility with the rest of the code in this repository.images
folder contains .png
figures used in our paper.results
folder contains the data used for plotting Figures 5 and 6 in our paper.DTARPS*.txt
contain four real TESS light curves used in the study (see our paper for details).eva_periodogram.R
since it contains the code for running the significance analysis procedure. Note that it depends on the following scripts: BLS/bls.R
, TCF3.0/intf_libtcf.R
, and utils.R
. utils.R
contains various utility functions. The code contains comments, so it might be easy to understand each function's usage.extra_figure_why_tcf_better.ipynb
contains Python code for generating Figure 11 from our paper.bls_and_tcf_noise_change_as_depth_decrease.R
was used to generate Figures 1-4 and 7-10 in the paper.python_utils.py
contains a standalone Python utility function (it was created since I was not able to implement it in R)real_light_curve_application.R
is kept in this repository to showcase how this code can be used on real light curves. For a complete example, see the function blsAndTCFDepthChangeReal
in the script bls_and_tcf_noise_change_as_depth_decrease.R
here.time_Analysis.R
contains the script used for plotting Figure 13 in the paper. The data in the script was obtained by running the (long) computations on the Kaggle kernel.This can be done by using:
# For BLS
R CMD SHLIB eebls.f -o a.out # will create a.out
# For TCF
R CMD SHLIB main_tcf.f95 median.f90 rand_tools.f95 tcf.f95 -o a.out # will create a.out
Alternatively, one can achieve the same using a set of terminal commands (gfortran
must be installed for this):
# For BLS
cd BLS
gfortran -c eebls.f # will create eebls.o
gfortran -shared eebls.o # will create a.out
# For TCF
cd ../TCF3.0
gfortran -c median.f90
gfortran -c rand_tools.f95
gfortran -c tcf.f95
gfortran -c main_tcf.f95
# Combine all object files into a single shared object file
gfortran -shared median.o rand_tools.o tcf.o main_tcf.o # will create a.out
BLS/bls.R
and TCF3.0/intf_libtcf.R
(locate the line dyn.load(...)
at the top of these files) to match the path to the BLS's a.out
and TCF's a.out
files.Note:
The function evd
inside eva_periodogram.R
contains the main functionality for calculating the False Alarm Probability and/or the SNR of periodogram peaks. It can handle both simulated and real observational data. No constraints on the observations, such as evenly spaced, no gaps, etc., are imposed.
Here is a basic example of using this function:
source("eva_periodogram.R") # To source the R script that contains the evd function.
result <- evd(
2, 0.01, 2, algo="BLS", noiseType=1, ntransits=10,
ofac=2, L=300, R=300, FAPSNR_mode=0, lctype="sim"
)
score <- result[1] # score is the FAP if FAPSNR_mode == 0, SNR if FAPSNR_mode == 1, and a weighted sum of FAP and SNR if FAPSNR_mode == 2.
algo
can be either BLS
or TCF
.noiseType=1
simulates Gaussian noise with a fixed mean and standard deviation.
utils.R
- we recommend the user instead manipulate the depth
argument since only the relative difference between the noise standard deviation and depth matters, and not the actual values.ntransits
controls how many transits must be contained inside the light curve.ofac
is the oversampling factor for computing the periodogram. A value of around 2-5 generally suffices.L
and R
are parameters used for the extreme value calculation.FAPSNR_mode
controls what metric must be used to get the periodogram score. 0 means only the FAP of the peak is computed. 1 means only the SNR of the peak is computed. 2 means both are computed, in which both FAP and SNR are weighted by fixed factors.lctype
can take values "sim" and "real". The former is to be used for simulations (in which case the period, depth, and duration are needed as input). The latter is to be used for calculations on real light curves (in which case, the observations (i.e., fluxes) and time epochs need to be passed. See below).For passing custom flux values and time epochs (e.g., in the case of real observational data), this can be done by:
result <- evd(y=y, t=t, ...)
where y
and t
denote the fluxes and time epochs, respectively. ...
denote any other arguments to evd
as stated above. Observational fluxes are generally associated with errors, but this cannot be used in the code as of now.
Important notes:
real_light_curve_application.R
contains a code example of how to use the extreme value code for BLS without any errors. See line 41 in real_light_curve_application.R
.The evd
function can be run on a set of transit depths for different periodogram algorithms independently to yield the minimum detectable depth. These can be used to make plots like the below:
For more details, please see our paper.
The extreme value part of the code is a replication of the approach described in Süveges (2014). This code is not an official implementation of that paper.
Most of our analysis used simulations. The applicability of our method was described on four TESS light curves. These are present as DTARPS*.txt
files in this repository.
If you find something not working as expected or anything weird, we would like to know and improve it! Please feel free to open an issue in the issue tracker or send an email.
The code was tested using some internal quick tests, which were designed while writing the code. These tests are not in this repository. It does not have a dedicated tests/
folder where ideally all the test scripts can be written. This is planned for the future. The code, as of now, is not optimized for speed. However, it is expected that it runs without any troubles in most use cases. Future versions of the code might focus on these issues.
A Python version of the comparison code is aimed to be released sometime in the future for wider accessibility. If you are interested in this, please let us know your thoughts at https://github.com/Yash-10/ARPS-Periodogram-Comparison/issues/2.