HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

runDR following filterSCE... am I missing something? #323

Closed jeffsteimle closed 1 year ago

jeffsteimle commented 1 year ago

I checked out #266 #268 but neither followed up sub-clustering with second round of runDR. I have a SingleCellExperiment object sce that contains the fullset of data. I can use runDR to generate TSNE, UMAP, whatever, and plot using plotDR.

I then want to subset the data, re-cluster, and generate new UMAPs. Ran: Tcells <- filterSCE(sce, cluster_id %in% c("CD4", "CD8"), k = "merge") No issues filtering.

Tcells <- cluster(Tcells, features = MarkersOfInterest, xim = 10, ydim = 10, maxK = 24, seed = 42) No issues clustering.

Tcells <- runDR(Tcells, dr = "UMAP", cells = 5000, features = MarkersOfInterest) Error in uwot(x = X, n_neighbors, n_components = n_components, : 'pca' must be a positive integer

Am I missing something? I tried removing the inherited UMAP in the reducedDim's, but that didn't seem to help. I'm stumped.

Thank you, Jeff

HelenaLC commented 1 year ago

If you say there are no issues until then, and every does looks reasonable, I can't see any issue either directly. Could you perhaps post some relevant outputs? E.g., the SCE itself (Tcells) before and after runDR? Also, I'm assuming runDR also fails after filtering, prior to running cluster? Maybe something like table(Tcells$sample_id, cluster_ids(Tcells, "merge")) before and after filtering? I really can't tell what's going on, an the error is curious indeed: Non very CATALYST related, but maybe the filtering makes the PCA fail or something?

jeffsteimle commented 1 year ago

Hi @HelenaLC

Thank you for your quick response. I should have provided more details in my initial post.

I have sce object that looks like this:

class: SingleCellExperiment dim: 58 13497169 metadata(5): experiment_info chs_by_fcs cluster_codes SOM_codes delta_area assays(2): counts exprs rownames(58): Ar BCKG ... HLA_DR CD33 rowData names(4): channel_name marker_name marker_class used_for_clustering colnames: NULL colData names(26): sample_id condition ... CD8Pre_Leading_Edge cluster_id reducedDimNames(0): mainExpName: NULL altExpNames(0):

I then ran to make object sce_filter sce_filter <- filterSCE(sce, cluster_id %in% c(1:2), k = "meta20")

class: SingleCellExperiment dim: 58 1181746 metadata(5): experiment_info chs_by_fcs cluster_codes SOM_codes delta_area assays(2): counts exprs rownames(58): Ar BCKG ... HLA_DR CD33 rowData names(4): channel_name marker_name marker_class used_for_clustering colnames: NULL colData names(26): sample_id condition ... CD8Pre_Leading_Edge cluster_id reducedDimNames(0): mainExpName: NULL altExpNames(0):

sce_filter <- cluster(sce_filter, features = c(rownames(sce)[12:26]), xdim = 10, ydim = 10, maxK = 12, seed = 42)

sce_filter <- runDR(sce_filter, dr = "UMAP", cells = 5000, features = c(rownames(sce)[12:26]))

Error in uwot(X = X, n_neighbors = n_neighbors, n_components = n_components, : 'pca' must be a positive integer

However, I have figured out the issue. I added additional columns to sce@metadata$experiment_info to make it easy to perform batch correction and testing across time points when performing differential abundance testing using edgeR. When I perform filterSCE(), it completely wipes sce_filter@metadata$experiment_info. I think this also causes some of columns from sce_filter@colData to go to NA.

I loaded an older save before adding the column information to experiment_info, and that runs as expected for the above commands. My workaround is to remove those additional columns from sce@metadata$experiment_info prior to filterSCE(). It works now as expected. I thought I was being clever storing batch information in that slot (it does work really well for plotting, batch correction, etc.), but I hadn't considered that other steps might require specific columns to exist only.

Thank you, Jeff

sessionInfo()

R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.0.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] batchelor_1.14.1 scran_1.26.1 miloR_1.6.0 edgeR_3.40.0
[5] limma_3.54.0 readxl_1.4.1 reshape2_1.4.4 patchwork_1.1.2
[9] openCyto_2.2.0 mvtnorm_1.1-3 ggcyto_1.26.0 ncdfFlow_2.44.0
[13] BH_1.78.0-0 flowWorkspace_4.10.1 dplyr_1.0.10 scater_1.26.1
[17] scuttle_1.8.0 diffcyt_1.18.0 ggplot2_3.4.0 flowCore_2.10.0
[21] cowplot_1.1.1 CATALYST_1.22.0 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0 [25] Biobase_2.58.0 GenomicRanges_1.50.1 GenomeInfoDb_1.34.3 IRanges_2.32.0
[29] S4Vectors_0.36.1 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_0.62.0

loaded via a namespace (and not attached): [1] rtracklayer_1.58.0 R.methodsS3_1.8.2 coda_0.19-4
[4] tidyr_1.2.1 irlba_2.3.5.1 multcomp_1.4-20
[7] DelayedArray_0.24.0 R.utils_2.12.2 data.table_1.14.4
[10] RCurl_1.98-1.9 doParallel_1.0.17 generics_0.1.3
[13] ScaledMatrix_1.6.0 TH.data_1.1-1 assertthat_0.2.1
[16] viridis_0.6.2 DEoptimR_1.0-11 fansi_1.0.3
[19] restfulr_0.0.15 Rgraphviz_2.42.0 igraph_1.3.5
[22] DBI_1.1.3 purrr_0.3.5 ks_1.13.5
[25] ggnewscale_0.4.8 ggpubr_0.4.0 backports_1.4.1
[28] cytolib_2.10.0 deldir_1.0-6 sparseMatrixStats_1.10.0
[31] vctrs_0.5.0 abind_1.4-5 withr_2.5.0
[34] ggforce_0.4.1 robustbase_0.95-0 bdsmatrix_1.3-6
[37] GenomicAlignments_1.34.0 mclust_6.0.0 mnormt_2.1.1
[40] cluster_2.1.4 crayon_1.5.2 drc_3.0-1
[43] methylKit_1.24.0 labeling_0.4.2 pkgconfig_2.0.3
[46] tweenr_2.0.2 nlme_3.1-160 vipor_0.4.5
[49] rlang_1.0.6 lifecycle_1.0.3 sandwich_3.0-2
[52] rsvd_1.0.5 cellranger_1.1.0 polyclip_1.10-4
[55] graph_1.76.0 flowClust_3.36.0 Matrix_1.5-3
[58] carData_3.0-5 boot_1.3-28 zoo_1.8-11
[61] beeswarm_0.4.0 ggridges_0.5.4 GlobalOptions_0.1.2
[64] png_0.1-7 viridisLite_0.4.1 rjson_0.2.21
[67] bitops_1.0-7 R.oo_1.25.0 ConsensusClusterPlus_1.62.0 [70] KernSmooth_2.23-20 Biostrings_2.66.0 DelayedMatrixStats_1.20.0
[73] shape_1.4.6 stringr_1.4.1 qvalue_2.30.0
[76] jpeg_0.1-9 rstatix_0.7.1 ggsignif_0.6.4
[79] beachmat_2.14.0 scales_1.2.1 magrittr_2.0.3
[82] plyr_1.8.8 hexbin_1.28.2 zlibbioc_1.44.0
[85] compiler_4.2.2 hdrcde_3.4 dqrng_0.3.0
[88] BiocIO_1.8.0 bbmle_1.0.25 RColorBrewer_1.1-3
[91] plotrix_3.8-2 clue_0.3-62 lme4_1.1-31
[94] rrcov_1.7-2 fastseg_1.44.0 Rsamtools_2.14.0
[97] cli_3.4.1 XVector_0.38.0 FlowSOM_2.6.0
[100] MASS_7.3-58.1 tidyselect_1.2.0 stringi_1.7.8
[103] RProtoBufLib_2.10.0 emdbook_1.3.12 yaml_2.3.6
[106] BiocSingular_1.14.0 locfit_1.5-9.6 latticeExtra_0.6-30
[109] ggrepel_0.9.2 grid_4.2.2 tools_4.2.2
[112] parallel_4.2.2 circlize_0.4.15 rstudioapi_0.14
[115] bluster_1.8.0 foreach_1.5.2 metapod_1.6.0
[118] gridExtra_2.3 farver_2.1.1 Rtsne_0.16
[121] ggraph_2.1.0 digest_0.6.30 pracma_2.4.2
[124] Rcpp_1.0.9 car_3.1-1 broom_1.0.1
[127] RcppAnnoy_0.0.20 fda_6.0.5 IDPmisc_1.1.20
[130] ComplexHeatmap_2.14.0 flowStats_4.10.0 colorspace_2.0-3
[133] XML_3.99-0.12 rainbow_3.7 splines_4.2.2
[136] uwot_0.1.14 RBGL_1.74.0 statmod_1.4.37
[139] graphlayouts_0.8.3 nloptr_2.0.3 fds_1.8
[142] tidygraph_1.2.2 corpcor_1.6.10 R6_2.5.1
[145] pillar_1.8.1 nnls_1.4 glue_1.6.2
[148] minqa_1.2.5 BiocParallel_1.32.1 BiocNeighbors_1.16.0
[151] deSolve_1.34 codetools_0.2-18 pcaPP_2.0-3
[154] utf8_1.2.2 ResidualMatrix_1.8.0 lattice_0.20-45
[157] tibble_3.1.8 numDeriv_2016.8-1.1 flowViz_1.62.0
[160] ggbeeswarm_0.6.0 colorRamps_2.3.1 gtools_3.9.3
[163] interp_1.1-3 survival_3.4-0 munsell_0.5.0
[166] GetoptLong_1.0.5 GenomeInfoDbData_1.2.9 iterators_1.0.14
[169] gtable_0.3.1