cole-trapnell-lab / cicero-release

https://cole-trapnell-lab.github.io/cicero-release/
MIT License
56 stars 14 forks source link

How to use gene activity matrix in plot_cells & mm10 genome to view connections? #43

Closed sta1245work closed 4 years ago

sta1245work commented 4 years ago

Hello,

I have two questions:

1). How to view gene accessibility using plot_cells() with gene activity matrix? Following the instructions on Cicero website normally, I was successful in clustering my single-cell data. However, when I try to use the genes argument in plot_cells(), I get an error that "None of the provided genes were found in the cds".

Thus, I am trying to create a new CDS object with cicero_gene_activities as the expression_matrix, per your response to MQMQ2018's second question in resolved issue #35. You responded to MQMQ2018 to use the gene activities matrix in place of the expression matrix and "gene information" in place of gene metadata.

I am not quite sure what you meant by "gene information". I have tried using gene_metadata = gene_anno as well as gene_annotation_sub (processed from ensemble's GTF files), and I keep getting this error:

input_cds2 <- suppressWarnings(new_cell_data_set(cicero_gene_activities, cell_metadata = cellinfo, gene_metadata = gene_anno)) Error: gene_metadata must be NULL or have the same number of rows as rows in expression_data

If the annotation data was what you meant by "gene information", can you advise me on how to reduce the annotation object to have the proper number of rows to successfully create the new CDS object so I can view gene accessibility through plot_cells?

2). mm10 genome with run_cicero? We did peak alignments with mm10 genome using Cell Ranger. I am interested in viewing coacessibility and other Cicero connections with my scATAC-seq data. Since Cicero has the mm9 genome preloaded, I was wondering how to load in the mm10 genome to Cicero so that I can call the mm10 genome with data("mouse.mm10.genome"), then whole_genome <- mouse.mm10.genome for use in run_cicero.

I have already done run_cicero using the default mm9 genome out of curiosity and would like to investigate connections and coaccessibility in Cicero further. However, I would prefer to be consistent and use the mm10 genome with run_cicero since we used mm10 for the peak alignments. Do you have any advice for how I can address this?

In case it is helpful, here is my session info:

sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS

attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] cicero_1.3.4.5 Gviz_1.30.1 monocle3_0.2.0 SingleCellExperiment_1.8.0 [5] SummarizedExperiment_1.16.1 DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0
[9] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.2 S4Vectors_0.24.3
[13] Biobase_2.46.0 BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] ProtGenerics_1.18.0 bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2
[5] progress_1.2.2 httr_1.4.1 tools_3.6.2 backports_1.1.5
[9] R6_2.4.1 rpart_4.1-15 Hmisc_4.3-1 DBI_1.1.0
[13] lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12 tidyselect_1.0.0
[17] gridExtra_2.3 prettyunits_1.1.1 bit_1.1-15.2 curl_4.3
[21] compiler_3.6.2 htmlTable_1.13.3 rtracklayer_1.46.0 scales_1.1.0
[25] checkmate_2.0.0 askpass_1.1 rappdirs_0.3.1 stringr_1.4.0
[29] digest_0.6.23 Rsamtools_2.2.1 foreign_0.8-74 XVector_0.26.0
[33] dichromat_2.0-0 htmltools_0.4.0 base64enc_0.1-3 jpeg_0.1-8.1
[37] pkgconfig_2.0.3 ensembldb_2.10.2 BSgenome_1.54.0 dbplyr_1.4.2
[41] htmlwidgets_1.5.1 rlang_0.4.4 VGAM_1.1-2 rstudioapi_0.11
[45] RSQLite_2.2.0 acepack_1.4.1 dplyr_0.8.4 VariantAnnotation_1.32.0 [49] RCurl_1.98-1.1 magrittr_1.5 GenomeInfoDbData_1.2.2 Formula_1.2-3
[53] Matrix_1.2-18 Rcpp_1.0.3 munsell_0.5.0 viridis_0.5.1
[57] lifecycle_0.1.0 stringi_1.4.5 zlibbioc_1.32.0 plyr_1.8.5
[61] BiocFileCache_1.10.2 blob_1.2.1 crayon_1.3.4 lattice_0.20-38
[65] Biostrings_2.54.0 splines_3.6.2 GenomicFeatures_1.38.1 hms_0.5.3
[69] knitr_1.28 pillar_1.4.3 reshape2_1.4.3 biomaRt_2.42.0
[73] XML_3.99-0.3 glue_1.3.1 biovizBase_1.34.1 latticeExtra_0.6-29
[77] data.table_1.12.8 png_0.1-7 vctrs_0.2.2 gtable_0.3.0
[81] openssl_1.4.1 purrr_0.3.3 assertthat_0.2.1 ggplot2_3.2.1
[85] xfun_0.12 AnnotationFilter_1.10.0 survival_3.1-8 viridisLite_0.3.0
[89] tibble_2.1.3 GenomicAlignments_1.22.1 AnnotationDbi_1.48.0 memoise_1.1.0
[93] cluster_2.1.0

Thank you

hpliner commented 4 years ago

Hello @sta1245work,

  1. The gene info just needs to conform to the cell_data_set requirements. As a starter, I recommend just setting gene_metadata to NULL (or leaving it out) - the gene metadata will then just be the row.names of the cicero matrix (which should be gene names).

  2. See here: https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#frequently-asked-questions for info on downloading new genome annotation files.

Best, Hannah