alanocallaghan / scater

Clone of the Bioconductor repository for the scater package.
https://bioconductor.org/packages/devel/bioc/html/scater.html
94 stars 40 forks source link

fill_by for violin plots #175

Open alanocallaghan opened 1 year ago

alanocallaghan commented 1 year ago

See discussion in #174

How should this handle positions? eg these last examples seem variously sub-optimal

library("scater")
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
plotColData(example_sce, y = "detected", x = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle", fill_by = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", point_fun = function(...) list(), fill_by="Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Mutation_Status", colour_by = "Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Cell_Cycle", colour_by = "Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by="Cell_Cycle", fill_by = "Mutation_Status")

alanocallaghan commented 1 year ago

See branch fill-by

kikegoni commented 1 year ago

That's perfect!! Thanks a lot for the nice examples!

kikegoni commented 1 year ago

Just as a suggestion (I can adapt it from your code), it would be great to add an option to modify the alpha = 0.2 parameter of the fill_byhere:

plot_out <- plot_out + do.call(geom_violin, c(viol_args, list(colour = "gray60", alpha = 0.2, scale = "width", width = 0.8)))
alanocallaghan commented 1 year ago

I don't like the behaviour shown here except when the fill_by, x, and colour_by arguments all match. However dodging the jittered points means setting the dodge and jitter width to be similar to the violin plots, and choosing how to group points (probably the same as fill_by). That would I guess mean also exposing a group_by arg and dodge_width, jitter_width...

shangguandong1996 commented 1 year ago

Hi, developer

I find it seems that fill_by will report a error

library("scater")
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
> plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle", fill_by = "Mutation_Status")
Error:
! Problem while computing aesthetics.
i Error occurred in the 1st layer.
Caused by error in `.data[["Mutation_Status"]]`:
! Column `Mutation_Status` not found in `.data`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error:
! Problem while computing aesthetics.
i Error occurred in the 1st layer.
Caused by error in `.data[["Mutation_Status"]]`:
! Column `Mutation_Status` not found in `.data`.
---
Backtrace:
     x
  1. +-base (local) `<fn>`(x)
  2. +-ggplot2:::print.ggplot(x)
  3. | +-ggplot2::ggplot_build(x)
  4. | \-ggplot2:::ggplot_build.ggplot(x)
  5. |   \-ggplot2:::by_layer(...)
  6. |     +-rlang::try_fetch(...)
  7. |     | +-base::tryCatch(...)
  8. |     | | \-base (local) tryCatchList(expr, classes, parentenv, handlers)
  9. |     | |   \-base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 10. |     | |     \-base (local) doTryCatch(return(expr), name, parentenv, handler)
 11. |     | \-base::withCallingHandlers(...)
 12. |     \-ggplot2 (local) f(l = layers[[i]], d = data[[i]])
 13. |       \-l$compute_aesthetics(d, plot)
 14. |         \-ggplot2 (local) compute_aesthetics(..., self = self)
 15. |           \-ggplot2:::scales_add_defaults(...)
 16. |             \-base::lapply(aesthetics[new_aesthetics], eval_tidy, data = data)
 17. |               \-rlang (local) FUN(X[[i]], ...)
 18. +-Mutation_Status
 19. +-rlang:::`[[.rlang_data_pronoun`(.data, "Mutation_Status")
 20. | \-rlang:::data_pronoun_get(...)
 21. \-rlang:::abort_data_pronoun(x, call = y)
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 
[2] LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scater_1.27.9               ggplot2_3.4.2              
 [3] scuttle_1.4.0               SingleCellExperiment_1.16.0
 [5] SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [7] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
 [9] IRanges_2.28.0              S4Vectors_0.32.4           
[11] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[13] matrixStats_0.63.0          devtools_2.4.5             
[15] usethis_2.1.6              

loaded via a namespace (and not attached):
 [1] bitops_1.0-7              fs_1.5.2                  tools_4.1.0              
 [4] profvis_0.3.7             utf8_1.2.3                R6_2.5.1                 
 [7] irlba_2.3.5.1             vipor_0.4.5               DBI_1.1.3                
[10] colorspace_2.1-0          urlchecker_1.0.1          withr_2.5.0              
[13] gridExtra_2.3             tidyselect_1.1.2          prettyunits_1.1.1        
[16] processx_3.7.0            compiler_4.1.0            cli_3.4.1                
[19] BiocNeighbors_1.12.0      DelayedArray_0.20.0       scales_1.2.1             
[22] callr_3.7.2               stringr_1.4.1             digest_0.6.29            
[25] XVector_0.34.0            pkgconfig_2.0.3           htmltools_0.5.3          
[28] sessioninfo_1.2.2         sparseMatrixStats_1.6.0   fastmap_1.1.0            
[31] htmlwidgets_1.5.4         rlang_1.1.1               rstudioapi_0.13          
[34] shiny_1.7.2               DelayedMatrixStats_1.16.0 generics_0.1.3           
[37] BiocParallel_1.28.3       dplyr_1.0.9               RCurl_1.98-1.12          
[40] magrittr_2.0.3            BiocSingular_1.10.0       GenomeInfoDbData_1.2.7   
[43] Matrix_1.3-4              Rcpp_1.0.10               ggbeeswarm_0.7.2         
[46] munsell_0.5.0             fansi_1.0.4               viridis_0.6.3            
[49] lifecycle_1.0.3           stringi_1.7.8             zlibbioc_1.40.0          
[52] pkgbuild_1.4.0            grid_4.1.0                parallel_4.1.0           
[55] promises_1.2.0.1          ggrepel_0.9.3             crayon_1.5.1             
[58] miniUI_0.1.1.1            lattice_0.20-45           cowplot_1.1.1            
[61] beachmat_2.10.0           ps_1.6.0                  pillar_1.9.0             
[64] ScaledMatrix_1.2.0        pkgload_1.3.0             glue_1.6.2               
[67] remotes_2.4.2             vctrs_0.6.2               httpuv_1.6.5             
[70] gtable_0.3.3              purrr_0.3.4               assertthat_0.2.1         
[73] cachem_1.0.5              rsvd_1.0.5                mime_0.12                
[76] xtable_1.8-4              later_1.3.0               viridisLite_0.4.2        
[79] tibble_3.2.1              beeswarm_0.4.0            memoise_2.0.1            
[82] ellipsis_0.3.2 
Yunuuuu commented 10 months ago

It would be nice if plotExpression also can control the fill_by argument

Yunuuuu commented 10 months ago

I attempted to implement it, but incorporating this functionality into the plotExpression function would complicate it significantly due to the unpredictability of user inputs, especially when using the group aesthetics for the violin plot. Therefore, I ultimately decided to utilize the makePerCellDF function for this purpose. However, I am unsure if it is necessary to add a function that returns the data in long-format for plot.

data <- scuttle::makePerCellDF(sce_object, features = markers)
data <- tidyr::pivot_longer(data,
        cols = all_of(markers),
        names_to = "Feature",
        values_to = "logcounts"
)
violin_plot <- ggplot(data, aes(factor(label), logcounts)) +
        geom_violin(aes(fill = celltypes), scale = "width", width = 0.8) +
        scale_fill_brewer(type = "qual", palette = "Set3") +
        guides(fill = guide_legend(
            title = "Cell type", override.aes = list(size = 2L), ncol = 1L
        )) +
        labs(x = NULL) +
        facet_wrap(vars(Feature),
            ncol = n_col, scales = "free_x"
        ) +
        cowplot::theme_cowplot(font_size = 10L) +
        theme(axis.text.x = element_text(size = 6L))