cole-trapnell-lab / cicero-release

https://cole-trapnell-lab.github.io/cicero-release/
MIT License
55 stars 15 forks source link

every link duplicated in the results #84

Closed ccshao closed 2 years ago

ccshao commented 2 years ago

Dear cicero team,

Thank you very much for the tools. Surpringly in our project we found essentially every link is duplicated. We used the preprocessed results from Signac and followed the tutorial of cicero. Duplicated links are not mentioned the document/dicussions, so we are are not sure whether there are problems in our analysis/data, or an issue in cicero.

Thank you ver much for your time! We are glad to provide raw data if needed.

Generating results.

library(SeuratWrappers, lib.loc = "/home/shao/R/R_4.1.2_signac/")
library(Seurat, lib.loc = "/home/shao/R/R_4.1.2_signac/")
library(Signac, lib.loc = "/home/shao/R/R_4.1.2_signac/")

library(Gviz,     lib.loc = "/home/shao/R/R_4.1.2_signac/")
library(monocle3, lib.loc = "/home/shao/R/R_4.1.2_signac/")
library(cicero,   lib.loc = "/home/shao/R/R_4.1.2_signac/")

#- swt is a signac object
s_cds <- as.cell_data_set(swt)
# Ensure there are no peaks included with zero reads
s_cds <- s_cds[rowSums(exprs(s_cds)) != 0, ]
c_cds <- make_cicero_cds(s_cds, reduced_coordinates = reducedDims(s_cds)$UMAP)

#- run cicero
genome <- seqlengths(swt)
g_df   <- data.frame("chr" = names(genome), "length" = genome)

conns <- run_cicero(c_cds, genomic_coords = g_df)
saveRDS(conns, "conns.rds")

conns <- readRDS("./conns.rds")
ccan  <- generate_ccans(conns)
# [1] "Coaccessibility cutoff used: 0.7" | Default.
saveRDS(ccan, "wt_ccan.rds")

However, in our results, we found every link appears twice, with one as Peak1, the other as Peak2, and vice versa.

library(data.table)

#- read the result and add unique id to each link
all_links <- readRDS("../conns.rds") %>% as.data.table %>% .[, Peak2 := as.character(Peak2)] %>% na.omit

xx_1 <- all_links[Peak1 %like% "chr1-"] %>% .[, idx := paste0("l_", seq_len(nrow(.)))] %>% setcolorder("idx")
xx_2 <- all_links[Peak1 %like% "chr1-"] %>% .[, idx := paste0("l_", seq_len(nrow(.)))] %>% setcolorder("idx")

#- Same sets of peaks in Peak1 and Peak2, which is already surprising
identical(sort(unique(xx_1$Peak1)), sort(unique(xx_1$Peak2)))

#- Sort by coordidate by peaks
xx_1 <- xx_1[order(Peak1, Peak2)]
xx_3 <- xx_2[order(Peak2, Peak1)] %>% setcolorder(c("idx", "Peak2", "Peak1", "coaccess"))

#-  e.g., the link of chr1-100005926-100006678  and chr1-100046816-100047339 show twice, with id of l_7 and l_19.
 xx_1
             idx                    Peak1                    Peak2    coaccess
     1:      l_7 chr1-100005926-100006678 chr1-100046816-100047339  0.17056163
     2:      l_8 chr1-100005926-100006678 chr1-100104432-100104720  0.41436756
     3:      l_9 chr1-100005926-100006678 chr1-100169890-100170251  0.00000000
     4:     l_10 chr1-100005926-100006678 chr1-100231841-100232377 -0.42462429
     5:     l_11 chr1-100005926-100006678 chr1-100369671-100370167  0.39307990
    ---                                                                       
712414: l_712409   chr1-99970429-99971355   chr1-99771967-99773062  0.06093976
712415: l_712410   chr1-99970429-99971355   chr1-99774092-99774540  0.50683144
712416: l_712411   chr1-99970429-99971355   chr1-99813263-99814161  0.37028123
712417: l_712412   chr1-99970429-99971355   chr1-99943499-99943958  0.22503903
712418: l_712413   chr1-99970429-99971355   chr1-99966981-99968399 -0.28277783

xx_3
             idx                    Peak2                    Peak1    coaccess
     1:     l_19 chr1-100005926-100006678 chr1-100046816-100047339  0.17056163
     2:     l_83 chr1-100005926-100006678 chr1-100104432-100104720  0.41436756
     3:    l_121 chr1-100005926-100006678 chr1-100169890-100170251  0.00000000
     4:    l_133 chr1-100005926-100006678 chr1-100231841-100232377 -0.42462429
     5:    l_139 chr1-100005926-100006678 chr1-100369671-100370167  0.39307990
    ---                                                                       
712414: l_712363   chr1-99970429-99971355   chr1-99771967-99773062  0.06093976
712415: l_712373   chr1-99970429-99971355   chr1-99774092-99774540  0.50683144
712416: l_712383   chr1-99970429-99971355   chr1-99813263-99814161  0.37028123
712417: l_712393   chr1-99970429-99971355   chr1-99943499-99943958  0.22503903
712418: l_712403   chr1-99970429-99971355   chr1-99966981-99968399 -0.28277783

#- Two datasets are identical except the link ids.
xx_1_rename <- setnames(xx_1, LETTERS[1:4])
xx_3_rename <- setnames(xx_3, LETTERS[1:4])

all.equal(xx_1_rename[, -1], xx_3_rename[, -1])
# TRUE

R session.

R version 4.1.2 (2021-11-01)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Leap 15.3

Matrix products: default
BLAS:   /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] cicero_1.3.6                monocle3_1.2.9             
 [3] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
 [5] MatrixGenerics_1.6.0        matrixStats_0.61.0         
 [7] Biobase_2.54.0              Gviz_1.38.4                
 [9] GenomicRanges_1.46.1        GenomeInfoDb_1.30.0        
[11] IRanges_2.28.0              S4Vectors_0.32.3           
[13] BiocGenerics_0.40.0         Signac_1.6.0               
[15] SeuratObject_4.0.4          Seurat_4.1.0               
[17] SeuratWrappers_0.3.0        fpCompare_0.2.3            
[19] data.table_1.14.2           magrittr_2.0.1             
[21] patchwork_1.1.1             ggplot2_3.3.5              
[23] ll_0.1.0                    colorout_1.2-1             

loaded via a namespace (and not attached):
  [1] utf8_1.2.2               reticulate_1.22          R.utils_2.11.0          
  [4] lme4_1.1-27.1            tidyselect_1.1.1         RSQLite_2.2.9           
  [7] AnnotationDbi_1.56.2     htmlwidgets_1.5.4        docopt_0.7.1            
 [10] BiocParallel_1.28.3      Rtsne_0.15               munsell_0.5.0           
 [13] codetools_0.2-18         ica_1.0-2                future_1.27.0           
 [16] miniUI_0.1.1.1           withr_2.4.3              colorspace_2.0-2        
 [19] filelock_1.0.2           knitr_1.37               rstudioapi_0.13         
 [22] ROCR_1.0-11              tensor_1.5               listenv_0.8.0           
 [25] slam_0.1-50              GenomeInfoDbData_1.2.7   polyclip_1.10-0         
 [28] bit64_4.0.5              farver_2.1.0             parallelly_1.32.1       
 [31] vctrs_0.3.8              generics_0.1.1           xfun_0.29               
 [34] biovizBase_1.42.0        BiocFileCache_2.2.0      lsa_0.73.2              
 [37] ggseqlogo_0.1            R6_2.5.1                 rsvd_1.0.5              
 [40] VGAM_1.1-5               AnnotationFilter_1.18.0  bitops_1.0-7            
 [43] spatstat.utils_2.3-0     cachem_1.0.6             DelayedArray_0.20.0     
 [46] assertthat_0.2.1         promises_1.2.0.1         BiocIO_1.4.0            
 [49] scales_1.1.1             nnet_7.3-17              gtable_0.3.0            
 [52] globals_0.15.1           goftest_1.2-3            ensembldb_2.18.4        
 [55] rlang_1.0.2              RcppRoll_0.3.0           splines_4.1.2           
 [58] rtracklayer_1.54.0       lazyeval_0.2.2           dichromat_2.0-0         
 [61] checkmate_2.0.0          spatstat.geom_2.4-0      BiocManager_1.30.16     
 [64] yaml_2.2.1               reshape2_1.4.4           abind_1.4-5             
 [67] GenomicFeatures_1.46.3   backports_1.4.1          httpuv_1.6.4            
 [70] Hmisc_4.6-0              tools_4.1.2              ellipsis_0.3.2          
 [73] spatstat.core_2.3-2      RColorBrewer_1.1-2       ggridges_0.5.3          
 [76] Rcpp_1.0.7               plyr_1.8.6               base64enc_0.1-3         
 [79] progress_1.2.2           zlibbioc_1.40.0          purrr_0.3.4             
 [82] RCurl_1.98-1.5           prettyunits_1.1.1        rpart_4.1.16            
 [85] deldir_1.0-6             pbapply_1.5-0            cowplot_1.1.1           
 [88] zoo_1.8-9                ggrepel_0.9.1            cluster_2.1.3           
 [91] scattermore_0.7          lmtest_0.9-39            RANN_2.6.1              
 [94] SnowballC_0.7.0          ProtGenerics_1.26.0      fitdistrplus_1.1-6      
 [97] hms_1.1.1                mime_0.12                xtable_1.8-4            
[100] XML_3.99-0.8             jpeg_0.1-9               sparsesvd_0.2           
[103] gridExtra_2.3            compiler_4.1.2           biomaRt_2.50.2          
[106] tibble_3.1.6             KernSmooth_2.23-20       crayon_1.4.2            
[109] minqa_1.2.4              R.oo_1.24.0              htmltools_0.5.2         
[112] mgcv_1.8-40              later_1.3.0              Formula_1.2-4           
[115] tidyr_1.1.4              DBI_1.1.2                tweenr_1.0.2            
[118] dbplyr_2.1.1             rappdirs_0.3.3           MASS_7.3-56             
[121] boot_1.3-28              Matrix_1.4-1             cli_3.2.0               
[124] R.methodsS3_1.8.1        parallel_4.1.2           igraph_1.2.10           
[127] pkgconfig_2.0.3          GenomicAlignments_1.30.0 foreign_0.8-82          
[130] terra_1.6-3              plotly_4.10.0            spatstat.sparse_2.1-0   
[133] xml2_1.3.3               foreach_1.5.1            XVector_0.34.0          
[136] VariantAnnotation_1.40.0 stringr_1.4.0            digest_0.6.29           
[139] sctransform_0.3.3        RcppAnnoy_0.0.19         spatstat.data_2.1-2     
[142] Biostrings_2.62.0        leiden_0.3.9             fastmatch_1.1-3         
[145] htmlTable_2.4.0          uwot_0.1.11              curl_4.3.2              
[148] restfulr_0.0.13          shiny_1.7.1              Rsamtools_2.10.0        
[151] nloptr_1.2.2.3           rjson_0.2.21             lifecycle_1.0.1         
[154] nlme_3.1-157             jsonlite_1.7.2           viridisLite_0.4.0       
[157] BSgenome_1.62.0          fansi_0.5.0              pillar_1.6.4            
[160] lattice_0.20-45          KEGGREST_1.34.0          fastmap_1.1.0           
[163] httr_1.4.2               survival_3.3-1           glue_1.6.0              
[166] remotes_2.4.2            qlcMatrix_0.9.7          png_0.1-7               
[169] iterators_1.0.13         bit_4.0.4                ggforce_0.3.3           
[172] stringi_1.7.6            blob_1.2.2               latticeExtra_0.6-29     
[175] memoise_2.0.1            dplyr_1.0.7              irlba_2.3.5             
[178] future.apply_1.8.1      
> 
hpliner commented 2 years ago

Hello, Each set of connections is included twice as you mentioned (one with each peak as 'Peak1') to make downstream tasks easier (i.e. merging with an annotation file). If you want a unique set, you can use something like (Peak1 < Peak2, with Peaks as factors) to remove the duplicates.

Hope this helps, Hannah