GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Different logFC and FDR in pairwise testing using getMarkerFeatures() depending on order of groups #1926

Closed mireiacodina closed 1 year ago

mireiacodina commented 1 year ago

Hello,

Thanks for developing ArchR. I have a question related to the getMarkerfeatures() function to do pairwise testing between groups. I want to compare the accessibility between two cell types and I get different logFC and FDR values depending on the group that I use in useGroups and the one that I set as bgdGroups.

Here is the code that I run:

### Pairwise comparison between groups
group1 <- "wt_ectoderm(50epi)"
group2 <- "wt_ventrolateral mesoderm(50epi)"

markerTest <- getMarkerFeatures(
  ArchRProj = proj50epi7, 
  useMatrix = "PeakMatrix",
  groupBy = "Clusters_RNA_sub",
  testMethod = "wilcoxon",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  useGroups = group1,
  bgdGroups = group2
)

markerTest2 <- getMarkerFeatures(
  ArchRProj = proj50epi7, 
  useMatrix = "PeakMatrix",
  groupBy = "Clusters_RNA_sub",
  testMethod = "wilcoxon",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  useGroups = group2,
  bgdGroups = group1
)

markerList <- getMarkers(markerTest, cutOff = "FDR <= 0.05 & abs(Log2FC) >= 0.5", returnGR = FALSE)
sig_peaks <- as.data.frame(markerList)
head(sig_peaks)

group         group_name seqnames  idx    start      end    Log2FC          FDR   MeanDiff
32640      1 wt_ectoderm(50epi)    chr12 1859  8915090  8915590 -2.546727 2.346360e-32 -0.4123869
213971     1 wt_ectoderm(50epi)     chr5 8909 42280151 42280651 -2.397811 1.243829e-31 -0.4614626
201083     1 wt_ectoderm(50epi)     chr4 6087 21660738 21661238 -3.300984 1.895344e-28 -0.4402070
41075      1 wt_ectoderm(50epi)    chr13 1104  4433536  4434036 -2.909491 2.280093e-27 -0.4121067
26517      1 wt_ectoderm(50epi)    chr11 5323 25231320 25231820 -2.248990 1.307753e-25 -0.4673072
226621     1 wt_ectoderm(50epi)     chr6 6507 32164076 32164576 -1.976865 7.026898e-25 -0.5328489

markerList2 <- getMarkers(markerTest2, cutOff = "FDR <= 0.05 & abs(Log2FC) >= 0.5", returnGR = FALSE)
sig_peaks2 <- as.data.frame(markerList2)
head(sig_peaks2)
group                       group_name seqnames  idx    start      end   Log2FC          FDR  MeanDiff
32640      1 wt_ventrolateral mesoderm(50epi)    chr12 1859  8915090  8915590 3.773655 5.760450e-40 0.4569012
213971     1 wt_ventrolateral mesoderm(50epi)     chr5 8909 42280151 42280651 2.777479 6.533203e-37 0.4658591
226621     1 wt_ventrolateral mesoderm(50epi)     chr6 6507 32164076 32164576 2.183288 4.831842e-36 0.5751346
52253      1 wt_ventrolateral mesoderm(50epi)    chr14 1343  5557140  5557640 3.220382 1.345166e-34 0.3908514
41530      1 wt_ventrolateral mesoderm(50epi)    chr13 1559  6151759  6152259 2.018802 2.807008e-34 0.4459561
26517      1 wt_ventrolateral mesoderm(50epi)    chr11 5323 25231320 25231820 2.404376 1.429788e-33 0.4239108

Also, the total number of significant peaks based on the cutoff that I used is different (3601 for sig_peaks and 4889 for sig_peaks2). I would be very thankful if I could get any feedback on why the results are different. I also tried running the code without the "bias" variable but it did not change the outcome.

Thank you very much.


SessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggrepel_0.9.3                       presto_1.0.0                        nabor_0.5.0                         SeuratObject_4.1.3                  sp_1.6-0                           
 [6] BSgenome.Drerio.UCSC.danRer11_1.4.2 BSgenome_1.66.3                     rtracklayer_1.58.0                  Biostrings_2.66.0                   XVector_0.38.0                     
[11] dplyr_1.1.1                         tidyr_1.3.0                         rhdf5_2.42.0                        Rcpp_1.0.10                         Matrix_1.5-3                       
[16] data.table_1.14.8                   stringr_1.5.0                       plyr_1.8.8                          magrittr_2.0.3                      ggplot2_3.4.2                      
[21] gtable_0.3.3                        gtools_3.9.4                        gridExtra_2.3                       ArchR_1.0.2                         SummarizedExperiment_1.28.0        
[26] Biobase_2.58.0                      GenomicRanges_1.50.2                GenomeInfoDb_1.34.9                 IRanges_2.32.0                      S4Vectors_0.36.2                   
[31] BiocGenerics_0.44.0                 MatrixGenerics_1.10.0               matrixStats_0.63.0                 

loaded via a namespace (and not attached):
 [1] BiocManager_1.30.20      GenomeInfoDbData_1.2.9   Rsamtools_2.14.0         yaml_2.3.7               globals_0.16.2           pillar_1.9.0             lattice_0.20-45         
 [8] glue_1.6.2               digest_0.6.31            RColorBrewer_1.1-3       colorspace_2.1-0         htmltools_0.5.5          XML_3.99-0.14            pkgconfig_2.0.3         
[15] pheatmap_1.0.12          listenv_0.9.0            zlibbioc_1.44.0          purrr_1.0.1              scales_1.2.1             BiocParallel_1.32.6      tibble_3.2.1            
[22] farver_2.1.1             generics_0.1.3           withr_2.5.0              ROCR_1.0-11              cli_3.6.1                XLConnect_1.0.7          crayon_1.5.2            
[29] evaluate_0.20            parallelly_1.35.0        future_1.32.0            fansi_1.0.4              progressr_0.13.0         Cairo_1.6-0              tools_4.2.3             
[36] BiocIO_1.8.0             lifecycle_1.0.3          Rhdf5lib_1.20.0          kernlab_0.9-32           munsell_0.5.0            DelayedArray_0.24.0      compiler_4.2.3          
[43] rlang_1.1.0              RCurl_1.98-1.12          rhdf5filters_1.10.1      rstudioapi_0.14          rjson_0.2.21             labeling_0.4.2           bitops_1.0-7            
[50] rmarkdown_2.21           restfulr_0.0.15          codetools_0.2-19         R6_2.5.1                 GenomicAlignments_1.34.1 knitr_1.42               future.apply_1.10.0     
[57] fastmap_1.1.1            utf8_1.2.3               rJava_1.0-6              stringi_1.7.12           parallel_4.2.3           vctrs_0.6.1              tidyselect_1.2.0        
[64] xfun_0.38     
rcorces commented 1 year ago

Hi @mireiacodina! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
It is worth noting that there are very few actual bugs in ArchR. If you are getting an error, it is probably something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, how will we be able to help? Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for. If your post does not contain a reproducible example, it is unlikely to receive a response.
In addition to a reproducible example, you must respond to the following questions before we help you, unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

mireiacodina commented 1 year ago

I am sorry, I moved it to the discussions session as it is not an error from the code.

dhkong78 commented 9 months ago

Hello, I met the same issue! Did you solve it or any comments? Thank you.