HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 31 forks source link

Multiple Group Experiments #257

Closed ramziabb closed 2 years ago

ramziabb commented 2 years ago

Hello Helena - for reference this is similar to Issue #88.

I was using the workflow for a treatment group and control group. But a deeper dive into the clinical data shows many patients were on drug holds at various time points. So I need a third group for analysis: 1. Treatment, 2. Treatment on Hold, 3. Control.

Everything works the same in the workflow until we get to the differential analysis. I tried to tinker with this but I haven't become proficient with these objects yet and they are so heavily layered it is difficult to penetrate.

I understand that a three way comparison is not really the best approach for this workflow. I am OK with the solution of creating two separate sce objects, once for Treatment vs. Control and one for Treatment vs. Treatment on Hold (or whatever I decide to do). So I copied in your code fragment from #88 and the result is below. I tinkered a bit to get at the errors but no luck!

Thanks, Ramzi

sub <- sce[, sce$Group == "A"] ei <- metadata(sub) ei <- ei[ei$Group == "A", ] Error in ei[ei$Group == "A", ] : incorrect number of dimensions

drop no longer existent factor levels

to avoid potential complications

ei$Group <- droplevels(ei$Group) Error in UseMethod("droplevels") : no applicable method for 'droplevels' applied to an object of class "NULL"

specify design & contrast matrices

design <- createDesignMatrix(ei, cols_design = "Group") Error in createDesignMatrix(ei, cols_design = "Group") : any(class(experiment_info) %in% c("data.frame", "tbl_df", "tbl")) || .... is not TRUE

ramziabb commented 2 years ago

That looks terrible so here is a screenshot. The group I was breaking out is "Itacitinib" Thanks.

image

One other thing, obviously I would want to create the figures with the two groups we are comparing. So maybe the easiest solution would just be to create sub objects containing only the two groups I want to compare, and then running the code as it is. Happy to do this. Just need to know how to break the sce object into two sub objects, one containing groups A and B, and one containing groups A and C. And then how to change the ei object and whatever else needs to be changed to reflect this.

Thanks again! This is the figure with all three groups, but containing p-values for comparing groups Itacitinib and Hold. I did this by making the contrast matrix c(0, 1, 0). But again, the figure wouldn't be usable with this approach as it has all three groups in there and is misleading.

image

ramziabb commented 2 years ago

My sessionInfo()

R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] readxl_1.3.1 HDCytoData_1.12.0 ExperimentHub_2.0.0 AnnotationHub_3.0.1 BiocFileCache_2.0.0
[6] dbplyr_2.1.1 CATALYST_1.16.2 cowplot_1.1.1 ggplot2_3.3.5 diffcyt_1.12.0
[11] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 Biobase_2.52.0 GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
[16] IRanges_2.26.0 S4Vectors_0.30.0 BiocGenerics_0.38.0 MatrixGenerics_1.4.3 matrixStats_0.61.0
[21] flowCore_2.4.0

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 scattermore_0.7 flowWorkspace_4.4.0 tidyr_1.1.4 bit64_4.0.5
[6] irlba_2.3.3 multcomp_1.4-17 DelayedArray_0.18.0 data.table_1.14.2 KEGGREST_1.32.0
[11] RCurl_1.98-1.5 doParallel_1.0.16 generics_0.1.1 ScaledMatrix_1.0.0 TH.data_1.1-0
[16] RSQLite_2.2.8 ggpointdensity_0.1.0 bit_4.0.4 xml2_1.3.2 httpuv_1.6.3
[21] assertthat_0.2.1 viridis_0.6.2 hms_1.1.1 promises_1.2.0.1 fansi_0.5.0
[26] Rgraphviz_2.36.0 igraph_1.2.7 DBI_1.1.1 purrr_0.3.4 ellipsis_0.3.2
[31] RSpectra_0.16-0 dplyr_1.0.7 ggcyto_1.20.0 ggnewscale_0.4.5 ggpubr_0.4.0
[36] backports_1.3.0 cytolib_2.4.0 RcppParallel_5.1.4 sparseMatrixStats_1.4.2 vctrs_0.3.8
[41] Cairo_1.5-12.2 abind_1.4-5 cachem_1.0.6 withr_2.4.2 ggforce_0.3.3
[46] aws.signature_0.6.0 cluster_2.1.2 crayon_1.4.2 drc_3.0-1 edgeR_3.34.1
[51] pkgconfig_2.0.3 labeling_0.4.2 tweenr_1.0.2 nlme_3.1-153 vipor_0.4.5
[56] rlang_0.4.12 lifecycle_1.0.1 sandwich_3.0-1 filelock_1.0.2 rsvd_1.0.5
[61] cellranger_1.1.0 polyclip_1.10-0 graph_1.70.0 Matrix_1.3-4 carData_3.0-4
[66] boot_1.3-28 zoo_1.8-9 base64enc_0.1-3 beeswarm_0.4.0 ggridges_0.5.3
[71] GlobalOptions_0.1.2 pheatmap_1.0.12 png_0.1-7 viridisLite_0.4.0 rjson_0.2.20
[76] bitops_1.0-7 ConsensusClusterPlus_1.56.0 Biostrings_2.60.2 blob_1.2.2 DelayedMatrixStats_1.14.3
[81] shape_1.4.6 stringr_1.4.0 jpeg_0.1-9 rstatix_0.7.0 ggsignif_0.6.3
[86] aws.s3_0.3.21 beachmat_2.8.1 scales_1.1.1 memoise_2.0.0 magrittr_2.0.1
[91] plyr_1.8.6 hexbin_1.28.2 zlibbioc_1.38.0 compiler_4.1.1 RColorBrewer_1.1-2
[96] plotrix_3.8-2 clue_0.3-60 lme4_1.1-27.1 cli_3.1.0 XVector_0.32.0
[101] ncdfFlow_2.38.0 FlowSOM_2.0.0 MASS_7.3-54 tidyselect_1.1.1 stringi_1.7.5
[106] forcats_0.5.1 RProtoBufLib_2.4.0 yaml_2.2.1 BiocSingular_1.8.1 locfit_1.5-9.4
[111] latticeExtra_0.6-29 ggrepel_0.9.1 grid_4.1.1 tools_4.1.1 rio_0.5.27
[116] CytoML_2.4.0 circlize_0.4.13 rstudioapi_0.13 foreach_1.5.1 foreign_0.8-81
[121] gridExtra_2.3 farver_2.1.0 Rtsne_0.15 digest_0.6.28 BiocManager_1.30.16
[126] shiny_1.7.1 Rcpp_1.0.7 car_3.0-11 broom_0.7.10 scuttle_1.2.1
[131] BiocVersion_3.13.1 later_1.3.0 RcppAnnoy_0.0.19 httr_1.4.2 AnnotationDbi_1.54.1
[136] ComplexHeatmap_2.8.0 colorspace_2.0-2 XML_3.99-0.8 splines_4.1.1 uwot_0.1.10
[141] RBGL_1.68.0 scater_1.20.1 xtable_1.8-4 jsonlite_1.7.2 nloptr_1.2.2.2
[146] R6_2.5.1 pillar_1.6.4 htmltools_0.5.2 mime_0.12 nnls_1.4
[151] glue_1.4.2 fastmap_1.1.0 minqa_1.2.4 BiocParallel_1.26.2 BiocNeighbors_1.10.0
[156] interactiveDisplayBase_1.30.0 codetools_0.2-18 mvtnorm_1.1-3 utf8_1.2.2 lattice_0.20-45
[161] tibble_3.1.5 curl_4.3.2 ggbeeswarm_0.6.0 colorRamps_2.3 gtools_3.9.2
[166] zip_2.2.0 openxlsx_4.2.4 survival_3.2-13 limma_3.48.3 munsell_0.5.0
[171] GetoptLong_1.0.5 GenomeInfoDbData_1.2.6 iterators_1.0.13 haven_2.4.3 reshape2_1.4.4
[176] gtable_0.3.0

ramziabb commented 2 years ago

I worked on this some more. Maybe closer, maybe not. Hoping for someone to help debug. Once again, I have an sce with 3 conditions. I want to split it into sub sce object to compare the different groups 1v2 and 1v3 separately and also plot heatmaps. Group names are: Itacitinib, Control, Hold. And here is the code:

`

subset groups Itacitinib and Control

sub <- sce[, sce$condition == "Itacitinib" | sce$condition == "Control"]

ei <- metadata(sub)$experiment_info ei <- ei[ei$condition == "Itacitinib" | ei$condition == "Control", ]

contrast <- createContrast(c(0, 1))

(da_formula1 <- createFormula(ei, cols_fixed = "condition", cols_random = "patient_id"))

da_res1 <- diffcyt(sub, formula = da_formula1, contrast = contrast, analysis_type = "DA", method_DA = "diffcyt-DA-GLMM", clustering_to_use = "merging1", verbose = FALSE)

names(da_res1) [1] "res" "d_counts" rowData(da_res1$res) DataFrame with 17 rows and 3 columns cluster_id p_val p_adj

Basophils Basophils NA NA Nonclass Monos Nonclass Monos NA NA Class Monos Class Monos NA NA CD4 CM T Cells CD4 CM T Cells NA NA CD56hiCD16- NK Cells CD56hiCD16- NK Cells NA NA ... ... ... ... Plasmacytoid DCs Plasmacytoid DCs NA NA mDCs mDCs NA NA B Cells B Cells NA NA IgD- B Cells IgD- B Cells NA NA Other Other NA NA

`

As you can see, I am getting all NAs, se definitely a problem here. If I skip the part where I cut down the ei object, it works (and I think the comparisons are valid, hard to be sure) but then the figure includes the third group which was removed.

`

subset groups Itacitinib and Control

sub <- sce[, sce$condition == "Itacitinib" | sce$condition == "Control"]

ei <- metadata(sub)$experiment_info

contrast <- createContrast(c(0, 1))

(da_formula1 <- createFormula(ei, cols_fixed = "condition", cols_random = "patient_id"))

da_res1 <- diffcyt(sub, formula = da_formula1, contrast = contrast, analysis_type = "DA", method_DA = "diffcyt-DA-GLMM", clustering_to_use = "merging1", verbose = FALSE)

names(da_res1) [1] "res" "d_counts"

rowData(da_res1$res) DataFrame with 17 rows and 3 columns cluster_id p_val p_adj

Basophils Basophils 0.66263953 0.9123684 Nonclass Monos Nonclass Monos 0.00751449 0.0319366 Class Monos Class Monos 0.63698651 0.9123684 CD4 CM T Cells CD4 CM T Cells 0.74656609 0.9123684 CD56hiCD16- NK Cells CD56hiCD16- NK Cells 0.60506387 0.9123684 ... ... ... ... Plasmacytoid DCs Plasmacytoid DCs 0.00277717 0.0236059 mDCs mDCs 0.00448343 0.0254061 B Cells B Cells 0.80503090 0.9123684 IgD- B Cells IgD- B Cells 0.76212833 0.9123684 Other Other 0.99672041 0.9967204 ` ![image](https://user-images.githubusercontent.com/93796927/154712157-8b83c5c1-67f2-4e47-bc02-4a6382343e03.png)
ramziabb commented 2 years ago

This is what the ei object looks like before I subset it:

ei sample_id condition patient_id n_cells 1 Control 1 Day 100 Set 1 Control Ctr 1 89859 2 Control 2 Day 100 Set 1 Control Ctr 2 117165 3 Patient 002 Day 100 Set 1 Itacitinib 002 37070 4 Patient 003 Day 100 Set 1 Control 003 63517 5 Patient 004 Day 100 Set 1 Hold 004 81002 6 Patient 005 Day 100 Set 1 Itacitinib 005 102485 7 Patient 008 Day 100 Set 1 Hold 008 48524 8 Patient 009 Day 100 Set 1 Hold 009 54212 9 Patient 010 Day 100 Set 1 Itacitinib 010 133828 10 Patient 011 Day 100 Set 1 Itacitinib 011 124524 11 Patient 012 Day 100 Set 1 Itacitinib 012 123579 12 Patient 015 Day 100 Set 1 Itacitinib 015 84454 13 Patient 016 Day 100 Set 1 Hold 016 63997 14 Patient 017 Day 100 Set 1 Hold 017 90596 15 Patient 018 Day 100 Set 1 Hold 018 84145 16 Patient 019 Day 100 Set 1 Hold 019 59999 17 Patient 022 Day 100 Set 1 Itacitinib 022 72832 18 Patient 024 Day 100 Set 1 Itacitinib 024 69063 19 Patient 026 Day 100 Set 1 Itacitinib 026 183995 20 Patient 027 Day 100 Set 1 Itacitinib 027 100721

And then after I subset it:

ei <- ei[ei$condition == "Itacitinib" | ei$condition == "Control", ] ei sample_id condition patient_id n_cells 1 Control 1 Day 100 Set 1 Control Ctr 1 89859 2 Control 2 Day 100 Set 1 Control Ctr 2 117165 3 Patient 002 Day 100 Set 1 Itacitinib 002 37070 4 Patient 003 Day 100 Set 1 Control 003 63517 6 Patient 005 Day 100 Set 1 Itacitinib 005 102485 9 Patient 010 Day 100 Set 1 Itacitinib 010 133828 10 Patient 011 Day 100 Set 1 Itacitinib 011 124524 11 Patient 012 Day 100 Set 1 Itacitinib 012 123579 12 Patient 015 Day 100 Set 1 Itacitinib 015 84454 17 Patient 022 Day 100 Set 1 Itacitinib 022 72832 18 Patient 024 Day 100 Set 1 Itacitinib 024 69063 19 Patient 026 Day 100 Set 1 Itacitinib 026 183995 20 Patient 027 Day 100 Set 1 Itacitinib 027 100721

See those numbers in the first column? Not sure what those are or where they come from. But is the problem that they are no longer sequential after the subsetting? There are skipped where we removed rows?

HelenaLC commented 2 years ago

Could you try assuring that there are no missing levels in the sample_ids after filtering? I.e., add something like ei$sample_id <- droplevels(ei$sample_id)? I believe the non-sequential row names don't matter, but superfluous levels might. Also: There's the filerSCE() function that assures filtering won't cause any issues. Specifically, when filtering for clusters, the ei, cluster_codes and colData might be affected. So I provided filterSCE() to spare the user from having to deal with this. In you case, you could try sub <- filterSCE(sce, condition %in% c("Control", "Itacitinib") (syntax is dplyr-style).

ramziabb commented 2 years ago

OK That seems to work. I don't fully understand the code but I'm OK with that.

Just to wrap a bow on this - if you have an sce with more than 2 groups and you want to compare any 2 of the groups, the following code works:

Use this contrast matrix to compare A vs. B

sub <- filterSCE(sce, condition %in% c("A", "B"))

ei <- metadata(sce)$experiment_info ei <- ei[ei$condition == "A" | ei$condition == "B", ]

contrast <- createContrast(c(0, 1))

Thanks again Helena.

HelenaLC commented 2 years ago

Yes, then again, you don't need to create a subset of the conditions of interest. Simply passing the contrast you'd like to test is sufficient, as the differential testing part would ignore any samples from conditions not included in the model.

ramziabb commented 2 years ago

You need to subset it to make your figures just have the two groups you are comparing, if that is something the user needs.

Also, while the results are similar between: contrast (0,1,0) versus doing the subset and then contrast(0,1) the p-values aren't exactly the same. I wasn't sure why that would be and did not drive down into it very much. But it does seem to make a difference in the results. Not sure if using negative one in the contrast matrix may do it differently.