MorganLevineLab / methylCIPHER

22 stars 9 forks source link

MorganLevineLab/methylCIPHER

On installation from Github, in R, install learnr and run learnr::run_tutorial(name = "instructions", package = "methylCIPHER") for an interactive tutorial.

The goal of methylCIPHER is to allow users to easily calculate their choice of CpG clocks using simple commands, from a single source. CpG epigenetic clocks are currently found in many places, and some require users to send data to external portals, which is not advisable when working with protected or restricted data. The current package allows you to calculate reported epigenetic clocks, or where not precisely disclosed, our best estimates–all performed locally on your own machine in R Studio. We would like to acknowledge the authors of the original clocks and their valuable contributions to aging research, and the requisite citations for their clocks can be found at the bottom of the current page. Please do not forget to cite them in your work!

Installation

You can install the released version of methylCIPHER and its imported packages from Github with:

devtools::install_github("MorganLevineLab/prcPhenoAge")
devtools::install_github("danbelsky/DunedinPoAm38")
devtools::install_github("MorganLevineLab/methylCIPHER")

Calculating Epigenetic Clocks and Predictors

Running single “clock” calculations

The current package contains a large number of currently available CpG clocks or CpG based predictors. While we strove to be inclusive of such published CpG-based epigenetic clocks to our knowledge, if you find we are missing a clock, please contact us and we will do our best to promptly include it, if possible. You can do so by raising an issue on this repo or emailing us directly at \<kyra[dot]thrush[at]yale[dot]edu>.

In order to calculate a CpG clock, you simply need to use the appropriate function, typically named “calc[ClockNameHere]”. For example:

library(methylCIPHER)
calcPhenoAge(exampleBetas, examplePheno, imputation = F)
name geo_accession gender age group sample PhenoAge
7786915023_R02C02 GSM1343050 M 57.9 1 1 52.29315
7786915135_R04C02 GSM1343051 M 42.0 1 2 41.05867
7471147149_R06C01 GSM1343052 M 47.4 1 3 43.54460
7786915035_R05C01 GSM1343053 M 49.3 1 4 43.96697
7786923035_R01C01 GSM1343054 M 52.5 1 5 40.35242

Alternatively, if you would just like to receive a vector with the clock values to use, rather than appending it to an existing phenotype/ demographic dataframe, simply use:

calcPhenoAge(exampleBetas, imputation = F)
#> 7786915023_R02C02 7786915135_R04C02 7471147149_R06C01 7786915035_R05C01 
#>          52.29315          41.05867          43.54460          43.96697 
#> 7786923035_R01C01 
#>          40.35242

Categories of Epigenetic Clocks

Due to the abundance of epigenetic clocks are overlapping reasons for use, it is important to keep track of which clocks are most related to each other. This will allow you to steer clear of multiple testing and collinearity problems if you use all available clocks for your analysis.

Another important note is that with a few exceptions, the list of following clocks were trained and validated almost exclusively in blood. This can lead to a number of observed effects, which include unreasonable shifts in age (intercept)As bespoke clocks are developed for use in additional human tissues, we will include these in their own section below.

Chronological Age Predictors

Cancer and Mitotic Rates

Gestational and Pediatric Age Predictors

Biological Age and Mortality Predictors

Trait Predictors

Non-Blood Clocks

Running Multiple Epigenetic Clocks Simultaneously

Running the Core Epigenetic Clocks

There are a number of epigenetic clocks available in the literature, but the majority of studies currently focus upon comparing a select few. Therefore, we have provided a helper function to quickly calculate these core clocks in just one line of code.

calcCoreClocks(exampleBetas, examplePheno)
#> Please remember to cite the core Clocks you have used! Please refer to the README.md file for assistance.
name geo_accession gender age group sample Hannum Horvath1 PhenoAge epiTOC2
7786915023_R02C02 GSM1343050 M 57.9 1 1 62.33422 56.20544 52.29315 5012.412
7786915135_R04C02 GSM1343051 M 42.0 1 2 53.90139 47.45754 41.05867 4622.625
7471147149_R06C01 GSM1343052 M 47.4 1 3 53.59733 48.24172 43.54460 2956.300
7786915035_R05C01 GSM1343053 M 49.3 1 4 59.53767 51.53588 43.96697 3446.410
7786923035_R01C01 GSM1343054 M 52.5 1 5 58.80107 45.05448 40.35242 3245.157

Running A User-Defined List of Epigenetic Clocks

The user is welcome to specify a vector of clocks that they would like to calculate, rather than running each individual clock calculation. In this case, you will need to choose from the following options:

clockOptions()
#>  [1] "calcAlcoholMcCartney"            "calcBMIMcCartney"               
#>  [3] "calcBocklandt"                   "calcBohlin"                     
#>  [5] "calcDNAmClockCortical"           "calcDNAmTL"                     
#>  [7] "calcDunedinPoAm38"               "calcEpiTOC"                     
#>  [9] "calcEpiTOC2"                     "calcGaragnani"                  
#> [11] "calcHannum"                      "calcHorvath1"                   
#> [13] "calcHorvath2"                    "calcHRSInChPhenoAge"            
#> [15] "calcHypoClock"                   "calcKnight"                     
#> [17] "calcLeeControl"                  "calcLeeRefinedRobust"           
#> [19] "calcLeeRobust"                   "calcLin"                        
#> [21] "calcMayne"                       "calcMiAge"                      
#> [23] "calcPEDBE"                       "calcPhenoAge"                   
#> [25] "calcSmokingMcCartney"            "calcVidalBralo"                 
#> [27] "calcWeidner"                     "calcZhang"                      
#> [29] "calcZhang2019"                   "prcPhenoAge::calcPRCPhenoAge"   
#> [31] "prcPhenoAge::calcnonPRCPhenoAge" "DunedinPoAm38::PoAmProjector"

To do so, here is an example:

userClocks <- c("calcSmokingMcCartney","calcPhenoAge","calcEpiTOC2")
calcUserClocks(userClocks, exampleBetas, examplePheno, imputation = F)
name geo_accession gender age group sample Smoking_McCartney PhenoAge epiTOC2
7786915023_R02C02 GSM1343050 M 57.9 1 1 3.993508 52.29315 5012.412
7786915135_R04C02 GSM1343051 M 42.0 1 2 4.501657 41.05867 4622.625
7471147149_R06C01 GSM1343052 M 47.4 1 3 3.173744 43.54460 2956.300
7786915035_R05C01 GSM1343053 M 49.3 1 4 3.216788 43.96697 3446.410
7786923035_R01C01 GSM1343054 M 52.5 1 5 4.414541 40.35242 3245.157

Missing beta values

Of course, all of the CpG clocks work best when you have all of the necessary probes’ beta values for each sample. However, sometimes after preprocessing, beta values will be removed for a variety of reasons. For each CpG clock, you have the option to impute missing values for CpGs that were removed across all samples. In this case, you will need to impute using a vector of your choice (e.g. mean methylation values across CpGs from an independent tissue-matched dataset). However, by default, imputation will not be performed and the portion of the clock that is reliant upon those CpGs will not be considered. To check quickly whether this is the case for your data and clock(s) of interest, we have created the following helper function:

getClockProbes(exampleBetas)
Clock Total.Probes Present.Probes Percent.Present
Alcohol 450 450 100%
BMI 1109 1109 100%
Bocklandt 1 1 100%
Bohlin 251 8 3%
DNAmClockCortical 347 33 10%
DNAmTL 140 11 8%
EpiToc2 163 163 100%
EpiToc 385 385 100%
Garagnani 1 1 100%
HRSInCHPhenoAge 959 959 100%
Hannum 71 71 100%
Horvath1 353 353 100%
Horvath2 391 390 100%
Knight 1 0 0%
LeeControl 546 13 2%
LeeRefinedRobust 395 9 2%
LeeRobust 558 9 2%
Lin 99 39 39%
Mayne 1 0 0%
MiAge 268 4 1%
PEDBE 94 14 15%
PhenoAge 513 513 100%
Smoking 233 233 100%
VidalBralo 8 5 62%
Weidner 3 3 100%
Zhang2019 514 131 25%
Zhang 10 10 100%
hypoClock 678 678 100%

Please note that this will not count columns of NAs for named CpGs as missing! If you want to check for this you can run the following line of code to find the column numbers that are all NAs. If you get “named integer(0)” then you don’t have any. We recommend that you remove any identified columns from your beta matrix entirely to avoid errors, and then rerun the code producing the table above.

which(apply(exampleBetas, 2, function(x)all(is.na(x))))

In the case that you have CpGs missing from only some samples, we encourage you to be aware of this early on. Run the following line and check that it is 0.

sum(is.na(betaMatrix))

If this does not end up being 0, you might consider running mean imputation within your data so that NA values for single/ few samples at least have mean values rather than being ignored.

Alternate Methods

PC Clocks

If you would like to lower the noise in your data, which is helpful for applications in clinical, intervention, or longitudinal data in particular, you might consider instead using our recently developed PC Clocks. You can use this code and refer to this publication.

Modules Clocks

[should we include this?]

Citations!

The current package is intended to increase visibility and accesibility for the many epigenetic clocks currently in circulation. Please do not forget to cite not only this work, but the works you use in your research the correspond to the appropriate epigenetic clocks.

citeMyClocks()

As an alternative to manually checking the list below, you can simply use the function citeMyClocks to automatically print out APA style citations of the appropriate publications. You can use a character vector or input with the function name(s) you used, or the same userClocks list you may have used for calcUserClocks.

citeMyClocks(userClocks, prettyprint = TRUE)
#> Clock Citations:
#> Levine, M. E., Lu, A. T., Quach, A., Chen, B. H., Assimes, T. L., Bandinelli, S., … Horvath, S. (2018). 
#>  An epigenetic biomarker of aging for lifespan and healthspan. 
#>  Aging, 10(4), 573–591. https://doi.org/10.18632/aging.101414
#> McCartney, D. L., Hillary, R. F., Stevenson, A. J., Ritchie, S. J., Walker, R. M., Zhang, Q., … Marioni, R. E. (2018). 
#>  Epigenetic prediction of complex traits and death. 
#>  Genome Biology, 19(1), 136. https://doi.org/10.1186/s13059-018-1514-1
#> Teschendorff, A. E. (2020). 
#>  A comparison of epigenetic mitotic-like clocks for cancer risk prediction. 
#>  Genome Medicine, 12(1), 56. https://doi.org/10.1186/s13073-020-00752-3

Please note that by default the function formats the output for nicer console output. Change to prettyprint = FALSE for a vector you can save in your workspace.

calcCoreClocks Function

If you used the calcCoreClocks function then the list of citations you will need is as below. Otherwise, please find the appropriate citations for the calculations you did run in their respective sections below.

Chronological Age Predictors

Cancer and Mitotic Rates

Gestational & Pediatric Age

Biological Age and Mortality Predictors

Trait Predictors

Bespoke Clocks