HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

filterSCE() in loop bug? #305

Closed 83years closed 1 year ago

83years commented 1 year ago

Hi Helena,

I think I have spotted a bug in the filterSCE() code.

i<-1
sce_PID <- filterSCE(sce_anno, external_id == pid[i])

generates this error:

Error in vapply(value, vdimfun, 0L) : values must be length 1,
 but FUN(X[[1]]) result is length 0

However, using this code sce_PID <- filterSCE(sce_anno, external_id == pid[1]) works.

also sce_PID <- filterSCE(sce_anno, external_id == pid["Healthy Donor]) works.

Having filterSCE() in a loop would be very handy for generating some of the more detailed/customised figures I am being asked to generate.

sessionInfo()


R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.3.6 cytofWorkflow_1.18.0 cowplot_1.1.1 uwot_0.1.11 Matrix_1.4-0
[6] HDCytoData_1.14.0 flowCore_2.6.0 ExperimentHub_2.2.1 AnnotationHub_3.2.2 BiocFileCache_2.2.1
[11] dbplyr_2.1.1 diffcyt_1.14.0 CATALYST_1.18.1 SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 [16] Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.4
[21] BiocGenerics_0.40.0 MatrixGenerics_1.6.0 matrixStats_0.62.0 readxl_1.4.0 knitr_1.39
[26] BiocStyle_2.22.0

HelenaLC commented 1 year ago

Hey there, this is a tricky one, indeed! It's not so much a bug rather than the way filterSCE is implemented. Basically, there's a dplyr-style filtering on the colData under the hood. Meaning, anything passed to the dots ... is seen as a symbolic expression (or what you might call it). Hence some unexpected behavior here.

One way to work around this is using the !! operator, which essentially declares a variable to be taken by its value (not as a symbolic). Here's a working example where I am subsetting one cluster at a time:

> library(CATALYST)
> data(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- prepData(PBMC_fs, PBMC_panel, PBMC_md)
> sce <- cluster(sce, verbose = FALSE)
> 
> ids <- levels(cluster_ids(sce, k <- "meta5"))
> sub <- lapply(seq_along(ids), \(i)
+   filterSCE(sce, k = k, cluster_id == ids[!!i]))
> sapply(sub, \(sce) table(cluster_ids(sce, k)))
   1    2    3    4    5 
2073  980 1784  286  305