cruk-mi / mesa

mesa package for Methylation Enrichment Sequencing Analysis
9 stars 3 forks source link

Function to select top DMRs between multiple comparisons #13

Open ativey07 opened 6 months ago

ativey07 commented 6 months ago

Having discussed with Simon we thought it might be useful if there was a function for selecting the top DMRs for each comparison where there are multiple contrasts. He suggested I put it as an issue... Something like this:



> DMRs <- calculateDMRs(exampleTumourNormal, variable = "type", contrasts = "all", keepContrastMeans = FALSE)

> DMRs %>% pivotDMRsLonger(makePositive = TRUE)
# A tibble: 277 × 9
   seqnames    start      end CpG_density log2FC   adjPval deltaBeta group1      group2    
   <fct>       <int>    <int>       <dbl>  <dbl>     <dbl>     <dbl> <chr>       <chr>     
 1 7        25018801 25019100        15.3   1.99 0.0496        0.106 LUAD        CRC       
 2 7        25852801 25853100        18.4   2.50 0.000410      0.608 CRC         LUAD      
 3 7        25852801 25853100        18.4   2.07 0.000138      0.559 CRC         NormalLung
 4 7        25852801 25853100        18.4   2.12 0.0259        0.441 LUSC        LUAD      
 5 7        25852801 25853100        18.4   2.19 0.0112        0.453 NormalColon LUAD      
 6 7        25852801 25853100        18.4   1.68 0.0207        0.392 LUSC        NormalLung
 7 7        25852801 25853100        18.4   1.75 0.0116        0.404 NormalColon NormalLung
 8 7        25860601 25860900        11.6   2.35 0.0115        0.476 CRC         LUAD      
 9 7        25860601 25860900        11.6   3.68 0.00424       0.548 CRC         LUSC      
10 7        25860601 25860900        11.6   3.21 0.0000168     0.531 CRC         NormalLung
# ℹ 267 more rows
# ℹ Use `print(n = ...)` to see more rows

> DMRs %>% pivotDMRsLonger(makePositive = TRUE) %>% group_by(group1, group2) %>% arrange(desc(deltaBeta))
# A tibble: 277 × 9
# Groups:   group1, group2 [18]
   seqnames    start      end CpG_density log2FC     adjPval deltaBeta group1 group2     
   <fct>       <int>    <int>       <dbl>  <dbl>       <dbl>     <dbl> <chr>  <chr>      
 1 7        27164401 27164700       15.0    4.39 0.0471          0.872 LUSC   NormalColon
 2 7        27166201 27166500       20.0    4.37 0.000000358     0.839 LUSC   NormalLung 
 3 7        27166201 27166500       20.0    4.34 0.0000262       0.839 LUSC   CRC        
 4 7        27239401 27239700       21.1    4.19 0.00524         0.838 CRC    NormalLung 
 5 7        27168601 27168900        8.79   3.84 0.0000698       0.819 LUSC   CRC        
 6 7        27166201 27166500       20.0    3.79 0.000779        0.815 LUSC   NormalColon
 7 7        27244801 27245100        8.34   4.27 0.0105          0.808 CRC    NormalLung 
 8 7        27165301 27165600       21.2    3.75 0.000402        0.793 LUSC   NormalLung 
 9 7        27151501 27151800        8.60   3.76 0.00000622      0.792 LUSC   CRC        
10 7        27165301 27165600       21.2    3.36 0.00830         0.772 LUSC   CRC        
# ℹ 267 more rows
# ℹ Use `print(n = ...)` to see more rows

Selecting top DMR for each comparison

> DMRs %>% pivotDMRsLonger(makePositive = TRUE) %>% group_by(group1, group2) %>% arrange(desc(deltaBeta)) %>% slice(1)
# A tibble: 18 × 9
# Groups:   group1, group2 [18]
   seqnames    start      end CpG_density log2FC     adjPval deltaBeta group1      group2     
   <fct>       <int>    <int>       <dbl>  <dbl>       <dbl>     <dbl> <chr>       <chr>      
 1 7        25852801 25853100       18.4   2.50  0.000410        0.608 CRC         LUAD       
 2 7        27220801 27221100        6.98  3.74  0.0126          0.720 CRC         LUSC       
 3 7        27239401 27239700       21.1   4.19  0.00524         0.838 CRC         NormalLung 
 4 7        27130201 27130500       28.2   1.88  0.0000619       0.432 LUAD        CRC        
 5 7        27044401 27044700       14.7   1.85  0.0178          0.271 LUAD        LUSC       
 6 7        27130201 27130500       28.2   1.71  0.00483         0.414 LUAD        NormalColon
 7 7        27124201 27124500       18.1   1.11  0.0237          0.430 LUAD        NormalLung 
 8 7        27166201 27166500       20.0   4.34  0.0000262       0.839 LUSC        CRC        
 9 7        27151501 27151800        8.60  2.51  0.00825         0.657 LUSC        LUAD       
10 7        27164401 27164700       15.0   4.39  0.0471          0.872 LUSC        NormalColon
11 7        27166201 27166500       20.0   4.37  0.000000358     0.839 LUSC        NormalLung 
12 7        27044401 27044700       14.7   1.11  0.0492          0.373 NormalColon CRC        
13 7        27088201 27088500       19.7   1.76  0.0112          0.593 NormalColon LUAD       
14 7        27044401 27044700       14.7   2.74  0.000000812     0.593 NormalColon LUSC       
15 7        27088201 27088500       19.7   2.12  0.0000463       0.666 NormalColon NormalLung 
16 7        27159901 27160200        9.48  1.54  0.0185          0.465 NormalLung  CRC        
17 7        27507901 27508200       11.1   0.992 0.0208          0.395 NormalLung  LUAD       
18 7        25079401 25079700        7.87  3.20  0.0403          0.564 NormalLung  LUSC       

Selecting top 10 DMRs for each comparison

> DMRs %>% pivotDMRsLonger(makePositive = TRUE) %>% group_by(group1, group2) %>% arrange(desc(deltaBeta)) %>% slice(1:10)
# A tibble: 133 × 9
# Groups:   group1, group2 [18]
   seqnames    start      end CpG_density log2FC  adjPval deltaBeta group1 group2
   <fct>       <int>    <int>       <dbl>  <dbl>    <dbl>     <dbl> <chr>  <chr> 
 1 7        25852801 25853100       18.4    2.50 0.000410     0.608 CRC    LUAD  
 2 7        27220801 27221100        6.98   2.69 0.0143       0.599 CRC    LUAD  
 3 7        27088201 27088500       19.7    2.01 0.000575     0.596 CRC    LUAD  
 4 7        27087901 27088200       18.7    1.72 0.00916      0.537 CRC    LUAD  
 5 7        25860601 25860900       11.6    2.35 0.0115       0.476 CRC    LUAD  
 6 7        27102001 27102300       17.1    1.82 0.00392      0.444 CRC    LUAD  
 7 7        27239701 27240000       14.5    2.10 0.00806      0.250 CRC    LUAD  
 8 7        27102301 27102600       11.7    1.96 0.0266       0.235 CRC    LUAD  
 9 7        27220801 27221100        6.98   3.74 0.0126       0.720 CRC    LUSC  
10 7        27246601 27246900        7.12   3.57 0.0312       0.668 CRC    LUSC  
# ℹ 123 more rows
# ℹ Use `print(n = ...)` to see more rows

```{r}