drisso / SingleCellExperiment

Clone of the Bioconductor repository for the SingleCellExperiment package, see https://bioconductor.org/packages/devel/bioc/html/SingleCellExperiment.html for the official development version.
66 stars 18 forks source link

Internal error in constructor when sample/feature name is NA #66

Open TuomasBorman opened 2 years ago

TuomasBorman commented 2 years ago

Hi,

when there is NA in sample or feature names, the constructor throws an error which is hard to interpret.

Error in if (!ok) { : missing value where TRUE/FALSE needed

library(SingleCellExperiment)

assay_data <- rbind(rep(0, 4), matrix(1:20, nrow = 5))
colnames(assay_data) <- paste0("sample", 1:4)
rownames(assay_data) <- paste0("entity", seq_len(6))

row_data <- data.frame(Kingdom = "A",
                       Phylum = rep(c("B1", "B2"), c(2, 4)),
                       Class = rep(c("C1", "C2", "C3"), each = 2),
                       OTU = paste0("D", 1:6),
                       row.names = rownames(assay_data),
                       stringsAsFactors = FALSE)

col_data <- data.frame(gg = c(1, 2, 3, 3),
                       group = rep(LETTERS[1:2], each = 2),
                       row.names = colnames(assay_data),
                       stringsAsFactors = FALSE)

# Works!
sce <- SingleCellExperiment(assays = SimpleList(counts = assay_data), 
                            rowData = row_data,
                            colData = col_data)

# Manipulate sample names in assay
assay_data_mod <- assay_data
colnames(assay_data_mod)[1] <- NA

# Intenal error (This happens if one ore more sample or feature names are NA)
sce <- SingleCellExperiment(assays = SimpleList(counts = assay_data_mod), 
                            rowData = row_data,
                            colData = col_data)
Session info ``` R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=fi_FI.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=fi_FI.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=fi_FI.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] mia_1.3.27 dplyr_1.0.8 testthat_3.1.1 MultiAssayExperiment_1.20.0 [5] TreeSummarizedExperiment_2.2.0 Biostrings_2.62.0 XVector_0.34.0 SingleCellExperiment_1.16.0 [9] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 [13] IRanges_2.28.0 S4Vectors_0.32.4 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 [17] matrixStats_0.61.0 loaded via a namespace (and not attached): [1] ggbeeswarm_0.6.0 colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.2 [5] scuttle_1.4.0 BiocNeighbors_1.12.0 fs_1.5.2 rstudioapi_0.13 [9] remotes_2.4.2 ggrepel_0.9.1 bit64_4.0.5 fansi_1.0.3 [13] decontam_1.14.0 codetools_0.2-18 splines_4.1.2 sparseMatrixStats_1.6.0 [17] cachem_1.0.6 scater_1.22.0 ade4_1.7-18 pkgload_1.2.4 [21] jsonlite_1.8.0 phyloseq_1.38.0 cluster_2.1.2 compiler_4.1.2 [25] assertthat_0.2.1 Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2 [29] cli_3.2.0 BiocSingular_1.10.0 prettyunits_1.1.1 tools_4.1.2 [33] igraph_1.2.11 rsvd_1.0.5 gtable_0.3.0 glue_1.6.2 [37] GenomeInfoDbData_1.2.7 reshape2_1.4.4 Rcpp_1.0.8.3 rhdf5filters_1.6.0 [41] vctrs_0.3.8 multtest_2.50.0 ape_5.6-2 nlme_3.1-157 [45] DECIPHER_2.22.0 iterators_1.0.14 DelayedMatrixStats_1.16.0 stringr_1.4.0 [49] ps_1.6.0 beachmat_2.10.0 lifecycle_1.0.1 irlba_2.3.5 [53] devtools_2.4.3 zlibbioc_1.40.0 MASS_7.3-56 scales_1.1.1 [57] biomformat_1.22.0 parallel_4.1.2 rhdf5_2.38.1 memoise_2.0.1 [61] gridExtra_2.3 ggplot2_3.3.5 yulab.utils_0.0.4 stringi_1.7.6 [65] RSQLite_2.2.11 desc_1.4.1 foreach_1.5.2 ScaledMatrix_1.2.0 [69] tidytree_0.3.7 permute_0.9-7 pkgbuild_1.3.1 BiocParallel_1.28.3 [73] rlang_1.0.2 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45 [77] Rhdf5lib_1.16.0 purrr_0.3.4 treeio_1.18.1 bit_4.0.4 [81] tidyselect_1.1.2 processx_3.5.3 plyr_1.8.7 magrittr_2.0.2 [85] R6_2.5.1 generics_0.1.2 DelayedArray_0.20.0 DBI_1.1.2 [89] pillar_1.7.0 withr_2.5.0 mgcv_1.8-38 survival_3.2-13 [93] RCurl_1.98-1.6 tibble_3.1.6 crayon_1.5.1 utf8_1.2.2 [97] viridis_0.6.2 usethis_2.1.5 grid_4.1.2 data.table_1.14.2 [101] blob_1.2.2 callr_3.7.0 vegan_2.6-2 tidyr_1.2.0 [105] munsell_0.5.0 DirichletMultinomial_1.36.0 beeswarm_0.4.0 viridisLite_0.4.0 [109] vipor_0.4.5 sessioninfo_1.2.2 ```
LTLA commented 2 years ago

This error occurs at a lower level, inside the SummarizedExperiment constructor that we delegate to:

se <- SummarizedExperiment(assays = SimpleList(counts = assay_data_mod),
                           rowData = row_data,
                           colData = col_data)
## Error in if (!ok) { : missing value where TRUE/FALSE needed

You might say that this is a bug, but IMO the real bug is having NAs in your dimnames in the first place. I find it odd that you'd have names for some samples but not for others. The only situation I can imagine is if you have one dataset that contains names, and another dataset that doesn't, and you want to combine them. Let's see how R handles this:

a <- matrix(runif(10), 5, 2)
colnames(a) <- LETTERS[1:2]
b <- matrix(runif(15), 5, 3)
cc <- cbind(a, b)
colnames(cc)
## [1] "A" "B" ""  ""  ""

So, R itself uses empty strings rather than NAs if it doesn't know the name. Indeed, R doesn't like NA names:

x <- data.frame(X=1:5)
rownames(x) <- c(NA,2,3,4,5)
## Error in `.rowNamesDF<-`(x, value = value) :
##   missing values in 'row.names' are not allowed

Best not to fight it, I think.

TuomasBorman commented 2 years ago

Yep, that's true

I got this error by making a mistake when I combined assays and constructed a new SCE (assay looked good, but some colnames were missing). In real life scenario it should be checked that combining is done correctly. This is something that user should know and check

I little bit wondered if this is good to report, but better safe than sorry, I guess...

TuomasBorman commented 2 years ago

Thanks!