LieberInstitute / spatialLIBD

Code for the spatialLIBD R/Bioconductor package and shiny app
http://LieberInstitute.github.io/spatialLIBD/
78 stars 16 forks source link

[BUG] Error during extract and reformat enrichment results when using `registration_wrapper` #72

Open boyiguo1 opened 8 months ago

boyiguo1 commented 8 months ago

Describe the bug

Thanks for creating and maintaining the awesome spatialLIBD package. Recently, I was trying to conduct spatial registration by running registration_wrapper function on a SpatialExperiment object. I encountered the error message Error in combn(colnames(registration_model)[regis_cols], 2) : n < m (please find the reproducible example below), which I have trouble interpreting the error message and fix it.

The spatialLIBD is at v1.14.1 installed via Bioconductor version '3.18'

Provide a minimally reproducible example (reprex)

# Load packages -----------------------------------------------------------
library(here)
#> here() starts at /private/var/folders/ff/m8yvwsq51sjg43xscd3x1tmr0000gn/T/RtmpWA9JNL/reprex-774f49f557da-stony-moth
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(spatialLIBD)
library(SingleR)
# library(tidyverse)

# Load spatialDLPFC data ---------------------------------------------------

spe <- fetch_data(type = "spatialDLPFC_Visium")
#> 2024-01-02 13:33:34.255759 loading file /Users/bguo6/Library/Caches/org.R-project.R/R/BiocFileCache/5449288a26e9_spe_filtered_final_with_clusters_and_deconvolution_results.rds%3Fdl%3D1

set.seed(1)
scr_idx <- sample(unique(spe$sample_id), 15)
tar_idx <- setdiff(unique(spe$sample_id), scr_idx)

src_spe <- spe[,spe$sample_id %in% scr_idx]
tar_spe <- spe[,spe$sample_id %in% tar_idx]

# Spatial Registration ----------------------------------------------------

## Get enrichment statistics from source ---------------------------------
sce_modeling_results <- registration_wrapper(
  sce = src_spe,
  # sce = src_spe,
  var_registration = "BayesSpace_harmony_09",
  var_sample_id = "sample_id",
  gene_ensembl = "gene_id",
  gene_name = "gene_name"
)
#> 2024-01-02 13:34:59.526861 make pseudobulk object
#> 2024-01-02 13:35:07.619735 dropping 0 pseudo-bulked samples that are below 'min_ncells'.
#> 2024-01-02 13:35:07.650823 drop lowly expressed genes
#> 2024-01-02 13:35:07.781492 normalize expression
#> 2024-01-02 13:35:08.545616 create model matrix
#> 2024-01-02 13:35:08.696194 run duplicateCorrelation()
#> 2024-01-02 13:35:26.808509 The estimated correlation is: 0.960849320333061
#> 2024-01-02 13:35:26.811491 computing enrichment statistics
#> 2024-01-02 13:35:28.62727 extract and reformat enrichment results
#> Error in combn(colnames(registration_model)[regis_cols], 2): n < m

Created on 2024-01-02 with reprex v2.0.2

Expected behavior

I would expect that registration_wrapper would run successfully.

R Session Information

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os macOS Sonoma 14.2.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2024-01-02 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0) #> AnnotationDbi 1.64.1 2023-11-02 [1] Bioconductor #> AnnotationHub 3.10.0 2023-10-26 [1] Bioconductor #> attempt 0.3.1 2020-05-03 [1] CRAN (R 4.3.0) #> beachmat 2.18.0 2023-10-24 [1] Bioconductor #> beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.0) #> benchmarkme 1.0.8 2022-06-12 [1] CRAN (R 4.3.0) #> benchmarkmeData 1.0.4 2020-04-23 [1] CRAN (R 4.3.0) #> Biobase * 2.62.0 2023-10-26 [1] Bioconductor #> BiocFileCache 2.10.1 2023-10-26 [1] Bioconductor #> BiocGenerics * 0.48.1 2023-11-02 [1] Bioconductor #> BiocIO 1.12.0 2023-10-26 [1] Bioconductor #> BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.3.0) #> BiocNeighbors 1.20.1 2023-12-18 [1] Bioconductor 3.18 (R 4.3.0) #> BiocParallel 1.36.0 2023-10-26 [1] Bioconductor #> BiocSingular 1.18.0 2023-10-24 [1] Bioconductor #> BiocVersion 3.18.1 2023-11-18 [1] Bioconductor 3.18 (R 4.3.2) #> Biostrings 2.70.1 2023-10-26 [1] Bioconductor #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0) #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.0) #> blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.0) #> bslib 0.6.1 2023-11-28 [1] CRAN (R 4.3.1) #> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> config 0.3.2 2023-08-30 [1] CRAN (R 4.3.0) #> cowplot 1.1.2 2023-12-15 [1] CRAN (R 4.3.1) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.1) #> data.table 1.14.10 2023-12-08 [1] CRAN (R 4.3.1) #> DBI 1.2.0 2023-12-21 [1] CRAN (R 4.3.1) #> dbplyr 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) #> DelayedArray 0.28.0 2023-10-24 [1] Bioconductor #> DelayedMatrixStats 1.24.0 2023-10-24 [1] Bioconductor #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.3.0) #> dotCall64 1.1-1 2023-11-28 [1] CRAN (R 4.3.1) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.1) #> DT 0.31 2023-12-09 [1] CRAN (R 4.3.1) #> edgeR 4.0.3 2023-12-09 [1] Bioconductor 3.18 (R 4.3.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> ExperimentHub 2.10.0 2023-10-26 [1] Bioconductor #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fields 15.2 2023-08-17 [1] CRAN (R 4.3.0) #> filelock 1.0.3 2023-12-11 [1] CRAN (R 4.3.1) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> GenomeInfoDb * 1.38.1 2023-11-11 [1] Bioconductor #> GenomeInfoDbData 1.2.11 2023-11-08 [1] Bioconductor #> GenomicAlignments 1.38.0 2023-10-26 [1] Bioconductor #> GenomicRanges * 1.54.1 2023-10-30 [1] Bioconductor #> ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.0) #> ggplot2 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) #> ggrepel 0.9.4 2023-10-13 [1] CRAN (R 4.3.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> golem 0.4.1 2023-06-05 [1] CRAN (R 4.3.0) #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> here * 1.0.1 2020-12-13 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1) #> httpuv 1.6.13 2023-12-06 [1] CRAN (R 4.3.1) #> httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.0) #> interactiveDisplayBase 1.40.0 2023-10-26 [1] Bioconductor #> IRanges * 2.36.0 2023-10-26 [1] Bioconductor #> irlba 2.3.5.1 2022-10-03 [1] CRAN (R 4.3.0) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0) #> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.1) #> KEGGREST 1.42.0 2023-10-26 [1] Bioconductor #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> later 1.3.2 2023-12-06 [1] CRAN (R 4.3.1) #> lattice 0.21-9 2023-10-01 [1] CRAN (R 4.3.2) #> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0) #> limma 3.58.1 2023-11-02 [1] Bioconductor #> locfit 1.5-9.8 2023-06-11 [1] CRAN (R 4.3.0) #> magick 2.8.2 2023-12-20 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> maps 3.4.2 2023-12-15 [1] CRAN (R 4.3.1) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.2) #> MatrixGenerics * 1.14.0 2023-10-26 [1] Bioconductor #> matrixStats * 1.2.0 2023-12-11 [1] CRAN (R 4.3.1) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) #> mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> paletteer 1.5.0 2022-10-19 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> plotly 4.10.3 2023-10-21 [1] CRAN (R 4.3.1) #> png 0.1-8 2022-11-29 [1] CRAN (R 4.3.0) #> promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.0) #> RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) #> RCurl 1.98-1.13 2023-11-02 [1] CRAN (R 4.3.1) #> rematch2 2.1.2 2020-05-01 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.0) #> rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.0) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.0) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.0) #> Rsamtools 2.18.0 2023-10-26 [1] Bioconductor #> RSQLite 2.3.4 2023-12-08 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.3.0) #> rtracklayer 1.62.0 2023-10-26 [1] Bioconductor #> S4Arrays 1.2.0 2023-10-26 [1] Bioconductor #> S4Vectors * 0.40.2 2023-11-25 [1] Bioconductor 3.18 (R 4.3.2) #> sass 0.4.8 2023-12-06 [1] CRAN (R 4.3.1) #> ScaledMatrix 1.10.0 2023-10-24 [1] Bioconductor #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.1) #> scater 1.30.1 2023-11-16 [1] Bioconductor #> scuttle 1.12.0 2023-10-24 [1] Bioconductor #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> shiny 1.8.0 2023-11-17 [1] CRAN (R 4.3.1) #> shinyWidgets 0.8.0 2023-08-30 [1] CRAN (R 4.3.0) #> SingleCellExperiment * 1.24.0 2023-10-24 [1] Bioconductor #> SingleR * 2.4.0 2023-10-24 [1] Bioconductor #> spam 2.10-0 2023-10-23 [1] CRAN (R 4.3.1) #> SparseArray 1.2.2 2023-11-08 [1] Bioconductor #> sparseMatrixStats 1.14.0 2023-10-26 [1] Bioconductor #> SpatialExperiment * 1.12.0 2023-10-26 [1] Bioconductor #> spatialLIBD * 1.14.1 2023-11-30 [1] Bioconductor 3.18 (R 4.3.0) #> statmod 1.5.0 2023-01-06 [1] CRAN (R 4.3.0) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.1) #> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0) #> SummarizedExperiment * 1.32.0 2023-10-24 [1] Bioconductor #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> vipor 0.4.7 2023-12-18 [1] CRAN (R 4.3.1) #> viridis 0.6.4 2023-07-22 [1] CRAN (R 4.3.0) #> viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> XML 3.99-0.16 2023-11-29 [1] CRAN (R 4.3.1) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) #> XVector 0.42.0 2023-10-26 [1] Bioconductor #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> zlibbioc 1.48.0 2023-10-26 [1] Bioconductor #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
lahuuki commented 7 months ago

Hi Boyi, This error was the result of src_spe$BayesSpace_harmony_09 being an integer, the root of the error is here where the pairwise comparison is set up. However this probably also cause unexpected behaviour with registration_stats_anova and registration_stats_enrichment as well since the model is not categorical but continuous.

This can be solved by replacing the numeric categories with syntactically valid character instead like this: src_spe$BayesSpace_harmony_09 <- paste0("k09d",src_spe$BayesSpace_harmony_09)

Non-syntatic characters names also cause issues like: Error in limma::makeContrasts(contrasts = z, levels = registration_model) : The levels must by syntactically valid names in R, see help(make.names).

We should include a check that var_registration is categorical and syntactically valid.

Thanks for pointing out this Bug!

boyiguo1 commented 7 months ago

I think @lahuuki's explanation is perfect. I just want to add another corner case if other's not paying attention with categorical and syntactically valid.

The variable has to be a string or factor whose labels follows the syntactically valid.

The error will remain if someone uses the following line of code

> spe$BayesSpace_harmony_09 <- factor(spe$BayesSpace_harmony_09)
> levels(spe$BayesSpace_harmony_09)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9"

There's no error with the following code

> spe$BayesSpace_harmony_09 <- factor(
+   spe$BayesSpace_harmony_09,
+   labels = paste0("k09d", levels(spe$BayesSpace_harmony_09))
+ )
> levels(spe$BayesSpace_harmony_09 )
[1] "k09d1" "k09d2" "k09d3" "k09d4" "k09d5" "k09d6" "k09d7"
[8] "k09d8" "k09d9"
lcolladotor commented 5 months ago

@lahuuki can you use Boyi's two reproducible examples to build a unit tests for these functions?

Thanks!