bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
166 stars 17 forks source link

[r] Improve trackplot_combine layout logic #116

Closed bnprks closed 3 months ago

bnprks commented 3 months ago

Add several improvements to the trackplot_combine layout logic.

For plots without sideplots, before and after are as follows:

And, when side_plot is present:

Script used for plotting demo (relies on the `pbmc-3k` vignette data) ```r library(BPCells) # devtools::load_all() frags_raw <- open_fragments_dir("vignettes/pbmc-3k-data/pbmc_3k_frags") mat_raw <- open_matrix_dir("vignettes/pbmc-3k-data/pbmc_3k_rna_raw") blacklist <- read_encode_blacklist("vignettes/pbmc-3k-data/references", genome="hg38") chrom_sizes <- read_ucsc_chrom_sizes("vignettes/pbmc-3k-data/references", genome="hg38") transcripts <- read_gencode_transcripts("vignettes/pbmc-3k-data/references", release="42") genes <- read_gencode_genes("vignettes/pbmc-3k-data/references", release="42") atac_qc <- qc_scATAC(frags_raw, genes, blacklist) pass_atac <- atac_qc |> dplyr::filter(nFrags > 1000, TSSEnrichment > 10) |> dplyr::pull(cellName) pass_rna <- colnames(mat_raw)[colSums(mat_raw) > 1e3] keeper_cells <- intersect(pass_atac, pass_rna) frags <- frags_raw |> select_cells(keeper_cells) keeper_genes <- rowSums(mat_raw) > 3 mat <- mat_raw[keeper_genes,keeper_cells] ###################################### # RNA-based clustering ###################################### # Normalize by reads-per-cell mat <- multiply_cols(mat, 1/Matrix::colSums(mat)) # Log normalization mat <- log1p(mat * 10000) stats <- matrix_stats(mat, row_stats="variance") # To keep the example small, we'll do a very naive variable gene selection variable_genes <- order(stats$row_stats["variance",], decreasing=TRUE) |> head(1000) |> sort() mat_norm <- mat[variable_genes,] mat_norm <- mat_norm |> write_matrix_memory(compress=FALSE) gene_means <- stats$row_stats["mean",variable_genes] gene_vars <- stats$row_stats["variance", variable_genes] mat_norm <- (mat_norm - gene_means) / gene_vars svd <- BPCells::svds(mat_norm, k=50) pca <- multiply_cols(svd$v, svd$d) clusts <- knn_hnsw(pca, ef=500) |> # Find approximate nearest neighbors knn_to_snn_graph() |> # Convert to a SNN graph cluster_graph_louvain() # Perform graph-based clustering cluster_annotations <- c("1" = "T", "2" = "CD8 T", "3" = "B", "4" = "T", "5" = "NK", "6" = "Mono", "7" = "Mono", "8" = "Mono", "9" = "T", "10" = "DC", "11" = "Mono", "12" = "DC") cell_types <- cluster_annotations[clusts] region <- gene_region(genes, "CD19", extend_bp = 1e5) region_narrow <- list(chr="chr16", start=28931964-100, end=28931964+100) read_counts <- atac_qc$nFrags[ match(cellNames(frags), atac_qc$cellName) ] coverage_plot <- trackplot_coverage( frags, region = region, groups=cell_types, read_counts, bins=500 ) gene_plot <- trackplot_gene(transcripts, region) scalebar_plot <- trackplot_scalebar(region) fake_peaks <- c(28846253, 28863641, 28925327, 28950995, 28974593) fake_loops <- tibble::tibble( chr = "chr16", start = fake_peaks[c(1,1,2,3)], end = fake_peaks[c(2,3,4,5)], score = c(4,1,3,2) ) loop_plot <- trackplot_loop(fake_loops, region, color_by="score") region_narrow <- list(chr="chr16", start=28931964-500, end=28931964+500) cell_subset <- cellNames(frags)[cell_types %in% c("B", "DC")] coverage_plot_narrow <- trackplot_coverage( frags |> select_cells(cell_subset), region=region_narrow, groups=cell_types[cell_types %in% c("B", "DC")], read_counts[cell_types %in% c("B", "DC")], bins=200, ) gene_plot_narrow <- trackplot_gene(transcripts, region_narrow) scalebar_narrow <- trackplot_scalebar(region_narrow, font_pt=8) expression <- collect_features(mat, "CD19") names(expression) <- "gene" expression_plot <- ggplot2::ggplot(expression, ggplot2::aes(cell_types, gene, fill=cell_types)) + ggplot2::geom_boxplot() + ggplot2::guides(y="none", fill="none") + ggplot2::labs(x=NULL, y="RNA") + ggplot2::scale_fill_manual(values=discrete_palette("stallion"), drop=FALSE) + BPCells:::trackplot_theme() plot1 <- trackplot_combine( list( scalebar_plot, coverage_plot, gene_plot, loop_plot, coverage_plot_narrow, gene_plot_narrow, scalebar_narrow ) ) ggplot2::ggsave(plot=plot1, "pre_trackplot.png", width=8, height=8, units="in") plot2 <- trackplot_combine( list( scalebar_plot, coverage_plot, gene_plot, loop_plot, coverage_plot_narrow, gene_plot_narrow, scalebar_narrow ), side_plot = expression_plot ) ggplot2::ggsave(plot=plot2, "pre_trackplot_with_side.png", width=8, height=8, units="in") ```
Before (no sideplot) ![pre_trackplot](https://github.com/user-attachments/assets/822d94fc-f019-4a59-a6d0-5474b1a4fa52)
After (no sideplot) ![trackplot](https://github.com/user-attachments/assets/df01c25d-7949-45c9-8cce-834b27491e7f)
Before (with sideplot) ![pre_trackplot_with_side](https://github.com/user-attachments/assets/7df80b67-61dd-45b4-888f-d5e0110822d1)
After (with sideplot) ![trackplot_with_side](https://github.com/user-attachments/assets/3f3507df-48c5-4632-8766-18554945c81b)
After with a low sideplot (see code from comment below) ![trackplot_with_low_side](https://github.com/user-attachments/assets/a0ce1d2a-2168-4794-8223-7e5a62dfe864)

(Edited to update plot previews with the later improvements)

bnprks commented 3 months ago

From offline suggestion, just added some smarter logic to decide whether to put the color legend above/below the side plot. Earlier plots are unchanged, but this new variant will look better:

Additional code ```r expression_plot_subset <- ggplot2::ggplot(expression[cell_types %in% c("B", "DC"),], ggplot2::aes(cell_types[cell_types %in% c("B", "DC")], gene, fill=cell_types[cell_types %in% c("B", "DC")])) + ggplot2::geom_boxplot() + ggplot2::guides(y="none", fill="none") + ggplot2::labs(x=NULL, y="RNA") + ggplot2::scale_fill_manual(values=discrete_palette("stallion"), drop=FALSE) + BPCells:::trackplot_theme() coverage_no_sidebar <- coverage_plot coverage_no_sidebar$trackplot$takes_sideplot <- FALSE plot3 <- trackplot_combine( list( scalebar_plot, coverage_no_sidebar, gene_plot, loop_plot, coverage_plot_narrow, gene_plot_narrow, scalebar_narrow ), side_plot = expression_plot_subset ) ggplot2::ggsave(plot=plot3, "pre_trackplot_with_low_side.png", width=8, height=8, units="in") ```
Plot with low side plot ![pre_trackplot_with_low_side](https://github.com/user-attachments/assets/7dea9579-865d-42d1-bd1a-5d42b686aa58)
bnprks commented 3 months ago

And it turns out we can fix some of the y-axis labels getting cut off. Patchwork puts plots later in the list on top of plots earlier in the list. So I added a shuffle to put the plots without y-axis labels first in the list

New plot with fix for label cutoffs ![trackplot](https://github.com/user-attachments/assets/fbc23fef-3e6d-4478-88b8-880eb172a54a)