HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

plotAbundances plotting only one level of a metadata column #301

Closed matoriab closed 8 months ago

matoriab commented 1 year ago

Hello CATALYST team. Wonderful package. I could really use some help with the below issue; thank you very much.

Overall issue: I am trying to plot abundances of populations generated by flowsom/metacluster. I am creating plots whereby the x-axis depicts some aspect of my metadata (e.g. "CTCAE"), and the y-axis depicts the frequencies of the clusters. This works perfectly with all metadata columns one called "condition". In this instance, only one condition is shown on the x-axis, not the four that should be there. I hope this makes sense; I will attach my script below along with images of the plots.

This is what I've done:

load libraries

library(Cairo)
library(CATALYST)
library(diffcyt)
library(flowWorkspace)
library(readxl)
library(flowCore)
library(SingleCellExperiment)
library(ggplot2)
library(ggrepel)
library(RColorBrewer)
library(cowplot)
library(uwot)
library(readr)
library(dplyr)
library(wesanderson)
setwd("/Users/maria/Library/CloudStorage/OneDrive/KCL/Experiments and Lab Book/Experimental Data/2022/outputs/WD 120922")
fcs_files <- list.files(pattern = ".fcs$")
fs <- read.flowSet(fcs_files, transformation = FALSE, truncate_max_range = FALSE)
channels <- pData(parameters(fs[[1]]))
View(channels)
View(colnames(fs))

importing the patient metadata

md <- read_excel("metadata.xlsx")
md 
pd <- read_excel("channelnames.xlsx")
pd       
all(pd$fcs_colname %in% colnames(fs))
[1] TRUE
all(pData(fs)$name %in% md$file_name)
[1] TRUE
> md$sample_id <- factor(md$sample_id, levels=c("HC0003", "HC0015", "40", "66", "74", "81"))
> md$CTCAE <- factor(md$CTCAE, levels=c("HC", "low", "high"))
> md$group <- factor(md$group, levels=c("all"))
> md$status <- factor(md$status, levels=c("HC", "patient"))
> md$condition <- factor(md$condition, levels=c("media", "saCD3", "saCD3_iso", "saCD3_aPD1"))

check that these are factors with the correct level order

md$sample_id
 [1] 40     40     40     40     66     66     66     66     74     74     74     81     81     81     81     HC0003 HC0003 HC0003 HC0003
[20] HC0015 HC0015 HC0015 HC0015
Levels: HC0003 HC0015 40 66 74 81
> md$CTCAE
 [1] high high high high low  low  low  low  low  low  low  high high high high HC   HC   HC   HC   HC   HC   HC   HC  
Levels: HC low high
> md$group
 [1] all all all all all all all all all all all all all all all all all all all all all all all
Levels: all
> md$status
 [1] patient patient patient patient patient patient patient patient patient patient patient patient patient patient patient HC      HC     
[18] HC      HC      HC      HC      HC      HC     
Levels: HC patient
> md$condition
 [1] media      saCD3_iso  saCD3_aPD1 saCD3      media      saCD3_iso  saCD3_aPD1 saCD3      media      saCD3_iso  saCD3_aPD1 media     
[13] saCD3_iso  saCD3_aPD1 saCD3      media      saCD3_iso  saCD3_aPD1 saCD3      media      saCD3_iso  saCD3_aPD1 saCD3     
Levels: media saCD3 saCD3_iso saCD3_aPD1
factors <- list(factors = c("sample_id", "CTCAE", "group", "status", "condition"))
sce <- prepData(fs, pd, md, features = pd$fcs_colname, 
+                 md_cols = list(file = "file_name", id = "sample_id", 
+                                factors = c("sample_id", "CTCAE", "group", "status", "condition")), FACS= TRUE)
> sce
class: SingleCellExperiment 
dim: 6 227248 
metadata(2): experiment_info chs_by_fcs
assays(2): counts exprs
rownames(6): CD25 CD3 ... PDL1 CD4
rowData names(3): channel_name marker_name marker_class
colnames: NULL
colData names(6): sample_id sample_id.1 ... status condition
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

this plot works with "condition"

plotCounts(sce, group_by = "sample_id", color_by = "condition") 
set.seed(4321)
sce <- cluster(sce, features = NULL,    
+                xdim = 10, ydim = 10, maxK = 10, seed = 4321) 

this plot works

q1 <- plotAbundances(sce, k = "meta10", by = "cluster_id", group_by ="CTCAE")
q1

this plot doesn't

q2 <- plotAbundances(sce, k = "meta10", by = "cluster_id", group_by ="condition")
q2

sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] BiocManager_1.30.18 wesanderson_0.3.6 dplyr_1.0.10 readr_2.1.2
[5] uwot_0.1.14 Matrix_1.5-0 cowplot_1.1.1 RColorBrewer_1.1-3
[9] ggrepel_0.9.1 ggplot2_3.3.6 flowCore_2.8.0 readxl_1.4.1
[13] flowWorkspace_4.8.0 diffcyt_1.16.0 CATALYST_1.20.1 SingleCellExperiment_1.18.0 [17] SummarizedExperiment_1.26.1 Biobase_2.56.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[21] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0 MatrixGenerics_1.8.1
[25] matrixStats_0.62.0 Cairo_1.6-0

loaded via a namespace (and not attached): [1] utf8_1.2.2 tidyselect_1.1.2 lme4_1.1-30 grid_4.2.1
[5] BiocParallel_1.30.3 Rtsne_0.16 aws.signature_0.6.0 munsell_0.5.0
[9] ScaledMatrix_1.4.1 codetools_0.2-18 interp_1.1-3 withr_2.5.0
[13] colorspace_2.0-3 ggsignif_0.6.3 labeling_0.4.2 GenomeInfoDbData_1.2.8
[17] polyclip_1.10-0 farver_2.1.1 pheatmap_1.0.12 vctrs_0.4.1
[21] generics_0.1.3 TH.data_1.1-1 R6_2.5.1 doParallel_1.0.17
[25] ggbeeswarm_0.6.0 clue_0.3-61 rsvd_1.0.5 locfit_1.5-9.6
[29] bitops_1.0-7 DelayedArray_0.22.0 scales_1.2.1 multcomp_1.4-20
[33] beeswarm_0.4.0 gtable_0.3.1 beachmat_2.12.0 RProtoBufLib_2.8.0
[37] sandwich_3.0-2 rlang_1.0.5 GlobalOptions_0.1.2 splines_4.2.1
[41] rstatix_0.7.0 hexbin_1.28.2 broom_1.0.1 yaml_2.3.5
[45] reshape2_1.4.4 abind_1.4-5 backports_1.4.1 RBGL_1.72.0
[49] tools_4.2.1 ellipsis_0.3.2 ggridges_0.5.3 Rcpp_1.0.9
[53] plyr_1.8.7 base64enc_0.1-3 sparseMatrixStats_1.8.0 zlibbioc_1.42.0
[57] purrr_0.3.4 RCurl_1.98-1.8 FlowSOM_2.4.0 ggpubr_0.4.0
[61] deldir_1.0-6 GetoptLong_1.0.5 viridis_0.6.2 zoo_1.8-10
[65] cluster_2.1.4 colorRamps_2.3.1 magrittr_2.0.3 ncdfFlow_2.42.1
[69] data.table_1.14.2 scattermore_0.8 circlize_0.4.15 mvtnorm_1.1-3
[73] ggnewscale_0.4.7 hms_1.1.2 XML_3.99-0.10 jpeg_0.1-9
[77] gridExtra_2.3 shape_1.4.6 ggcyto_1.24.1 compiler_4.2.1
[81] scater_1.24.0 tibble_3.1.8 crayon_1.5.1 ggpointdensity_0.1.0
[85] minqa_1.2.4 tzdb_0.3.0 tidyr_1.2.1 RcppParallel_5.1.5
[89] aws.s3_0.3.21 tweenr_2.0.2 ComplexHeatmap_2.12.1 MASS_7.3-58.1
[93] boot_1.3-28 car_3.1-0 cli_3.4.0 parallel_4.2.1
[97] igraph_1.3.4 pkgconfig_2.0.3 scuttle_1.6.3 xml2_1.3.3
[101] foreach_1.5.2 vipor_0.4.5 XVector_0.36.0 drc_3.0-1
[105] stringr_1.4.1 digest_0.6.29 ConsensusClusterPlus_1.60.0 graph_1.74.0
[109] cellranger_1.1.0 edgeR_3.38.4 DelayedMatrixStats_1.18.0 curl_4.3.2
[113] gtools_3.9.3 rjson_0.2.21 nloptr_2.0.3 lifecycle_1.0.2
[117] nlme_3.1-159 jsonlite_1.8.0 carData_3.0-5 BiocNeighbors_1.14.0
[121] viridisLite_0.4.1 limma_3.52.3 fansi_1.0.3 pillar_1.8.1
[125] lattice_0.20-45 httr_1.4.4 plotrix_3.8-2 survival_3.4-0
[129] glue_1.6.2 png_0.1-7 iterators_1.0.14 Rgraphviz_2.40.0
[133] ggforce_0.3.4 stringi_1.7.8 nnls_1.4 BiocSingular_1.12.0
[137] CytoML_2.8.1 latticeExtra_0.6-30 cytolib_2.8.0 irlba_2.3.5

Screenshot 2022-09-13 at 11 03 22
Screenshot 2022-09-13 at 11 03 39

m/81634303/189873627-2edf6740-5698-466d-933a-a6c269e2139b.png">

Screenshot 2022-09-13 at 11 03 51

Also wanted to mention that I've tried creating a new excel 'md' spreadsheet, renaming the column, creating a new column, but none of these have worked.

HelenaLC commented 1 year ago

The code is hardly legible; could you please edit/repost placing code inside triple-back-ticks? I.e.,

code here!

matoriab commented 1 year ago

apologies, I am new to this. Is it better now?

HelenaLC commented 1 year ago

So, here‘s what‘s happening: Under the hood, many of CATALYST‘s plotting functions will try to 1) identify unique sample IDs and 2) create a data.frame with any metadata in colData associated with the samples. The latter table is then used for plotting cells/samples/clusters with colors/grouping etc. according to relevant input arguments. Having a look at your metadata, which you nicely printed above, thanks! (variable md), it looks like sample_ids are not unique. Consequently, each sample maps to multiple conditions, and I assume plotAbundances just ignores all but the first match.

If you have a look at the prepData documentation, sample IDs should be unique, whereas everything else (e.g., patient ID, condition) are descriptive of the samples and potentially repetitive. For example, the yours should be sth like patient_CTCAlow_media (or anything that is unique!), not just 40. Just be sure that length(unique(md$sample_id)) == 1 holds and it should work.

HelenaLC commented 1 year ago

Hello and greetings. Is this issue still on? If no, please close it (or I will do so soon), thanks! If yes, any thoughts on my comment above?

HelenaLC commented 8 months ago

closing due to inactivity.