SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
177 stars 19 forks source link

Error about missing scran when run with single-cell reference #170

Open heathergeiger opened 3 years ago

heathergeiger commented 3 years ago

I am currently trying to run SingleR vs. the counts and labels available here:

http://geschwindlab.dgsom.ucla.edu/pages/codexviewer

Here is my code to get a log-normalized expression matrix and labels in the appropriate format for SingleR.

load("raw_counts_mat.rdata")

metadata <- read.csv("cell_metadata.csv",header=TRUE,row.names=1)
metadata <- metadata[,1:2]
no_metadata_cells <- setdiff(colnames(raw_counts_mat),rownames(metadata))
no_metadata_n <- length(no_metadata_cells)
dummy_metadata_for_no_metadata_cells <- data.frame(Cluster = rep("None",times=no_metadata_n),
Subcluster = rep("None",times=no_metadata_n),
row.names=no_metadata_cells)
metadata <- rbind(metadata,dummy_metadata_for_no_metadata_cells)
metadata <- metadata[colnames(raw_counts_mat),]

seurat.obj <- CreateSeuratObject(counts=raw_counts_mat,min.cells=3)
seurat.obj <- NormalizeData(seurat.obj)
seurat.obj$Cluster <- metadata$Cluster
seurat.obj <- subset(seurat.obj,Cluster != "None")

ref_norm_counts <- GetAssayData(seurat.obj,assay="RNA",slot="data")
ref_labels <- as.vector(seurat.obj$Cluster)
rm(seurat.obj)

I then ran SingleR like so, where "norm_counts" is the result of run GetAssayData for slot="data" on the Seurat object containing the test data.

predictions <- SingleR(test = norm_counts,
ref = ref_norm_counts,labels = ref_labels,
de.method="wilcox")

But I am getting the following error: "Error in loadNamespace(name) : there is no package called ‘scran’".

Any idea what is going on here? My sessionInfo() result is below. SingleR worked fine with a bulk reference, so the issue appears to be specific to when I use a single-cell reference with the appropriate change to "de.method".

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /nfs/sw/R/R-4.0.0/lib64/R/lib/libRblas.so
LAPACK: /nfs/sw/R/R-4.0.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] Seurat_3.2.1                SingleR_1.2.4              
 [3] SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
 [5] matrixStats_0.57.0          Biobase_2.48.0             
 [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
 [9] IRanges_2.22.2              S4Vectors_0.26.1           
[11] BiocGenerics_0.34.0         GeneBook_1.0               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15                    colorspace_1.4-1             
  [3] deldir_0.1-28                 ellipsis_0.3.1               
  [5] ggridges_0.5.2                XVector_0.28.0               
  [7] BiocNeighbors_1.7.0           spatstat.data_1.4-3          
  [9] leiden_0.3.3                  listenv_0.8.0                
 [11] ggrepel_0.8.2                 bit64_4.0.5                  
 [13] interactiveDisplayBase_1.27.5 AnnotationDbi_1.51.3         
 [15] codetools_0.2-16              splines_4.0.0                
 [17] polyclip_1.10-0               jsonlite_1.7.1               
 [19] ica_1.0-2                     cluster_2.1.0                
 [21] dbplyr_1.4.4                  png_0.1-7                    
 [23] uwot_0.1.8                    shiny_1.5.0                  
 [25] sctransform_0.2.1             BiocManager_1.30.10          
 [27] compiler_4.0.0                httr_1.4.2                   
 [29] assertthat_0.2.1              Matrix_1.2-18                
 [31] fastmap_1.0.1                 lazyeval_0.2.2               
 [33] later_1.1.0.1                 BiocSingular_1.5.0           
 [35] htmltools_0.5.0               tools_4.0.0                  
 [37] rsvd_1.0.3                    igraph_1.2.5                 
 [39] gtable_0.3.0                  glue_1.4.2                   
 [41] GenomeInfoDbData_1.2.3        RANN_2.6.1                   
 [43] reshape2_1.4.4                dplyr_1.0.2                  
 [45] rappdirs_0.3.1                spatstat_1.64-1              
 [47] Rcpp_1.0.5                    vctrs_0.3.4                  
 [49] nlme_3.1-149                  ExperimentHub_1.14.2         
 [51] DelayedMatrixStats_1.11.1     lmtest_0.9-37                
 [53] stringr_1.4.0                 globals_0.12.5               
 [55] mime_0.9                      miniUI_0.1.1.1               
 [57] lifecycle_0.2.0               irlba_2.3.3                  
 [59] goftest_1.2-2                 future_1.18.0                
 [61] AnnotationHub_2.21.5          zlibbioc_1.34.0              
 [63] MASS_7.3-52                   zoo_1.8-8                    
 [65] scales_1.1.1                  spatstat.utils_1.17-0        
 [67] promises_1.1.1                RColorBrewer_1.1-2           
 [69] yaml_2.2.1                    curl_4.3                     
 [71] gridExtra_2.3                 memoise_1.1.0                
 [73] reticulate_1.16               pbapply_1.4-3                
 [75] ggplot2_3.3.2                 rpart_4.1-15                 
 [77] stringi_1.4.6                 RSQLite_2.2.0                
 [79] BiocVersion_3.11.1            BiocParallel_1.22.0          
 [81] rlang_0.4.7                   pkgconfig_2.0.3              
 [83] bitops_1.0-6                  lattice_0.20-41              
 [85] tensor_1.5                    ROCR_1.0-11                  
 [87] purrr_0.3.4                   patchwork_1.0.1              
 [89] htmlwidgets_1.5.1             cowplot_1.1.0                
 [91] bit_4.0.4                     tidyselect_1.1.0             
 [93] RcppAnnoy_0.0.16              plyr_1.8.6                   
 [95] magrittr_1.5                  R6_2.4.1                     
 [97] generics_0.0.2                DBI_1.1.0                    
 [99] mgcv_1.8-33                   pillar_1.4.6                 
[101] fitdistrplus_1.1-1            abind_1.4-5                  
[103] survival_3.2-3                RCurl_1.98-1.2               
[105] tibble_3.0.3                  future.apply_1.6.0           
[107] crayon_1.3.4                  KernSmooth_2.23-17           
[109] BiocFileCache_1.13.1          plotly_4.9.2.1               
[111] grid_4.0.0                    data.table_1.13.0            
[113] blob_1.2.1                    digest_0.6.25                
[115] xtable_1.8-4                  tidyr_1.1.2                  
[117] httpuv_1.5.4                  munsell_0.5.0                
[119] viridisLite_0.3.0            
LTLA commented 3 years ago

Nothing too complicated. When de.method="wilcox" or "t", the package uses scran's functions to perform the pairwise t-tests or Wilcoxon tests in an efficient manner; so to use that functionality, you'll need scran installed, as the error message suggests. It's not installed by default to keep SingleR's dependencies low, given that the default method on the default bulk references doesn't require scran.

So just BiocManager::install('scran') and you'll be good to go.

dtm2451 commented 3 years ago

Perhaps we should add an if !require("scran") { stop('scran package is required for de.method="wilcox" or "t"') }? I believe this require() conditional method is the recommendation from Bioconductor's developer guidelines, but I'm curious about your thoughts, @LTLA, as there is the downside of then loading that entire package in all de.method="wilcox" or "t" cases!

LTLA commented 3 years ago

Hm. Traditionally I have always considered the error message out of :: to be satisfactory. Also it was a pain to have to write these protective clauses every time I used a Suggested package.

The best of both worlds would be to write a little getter function along the lines of:

checkForPackage <- function(pkg) {
    if (!requireNamespace(pkg, quietly=TRUE)) {
         # Perhaps have some smarter checks about whether something is
         # a Bioconductor package, but we could also just trust the developer here.
         stop(pkg, " is not installed, run BiocManager::install('", pkg, "')")
    }
}

which avoids the need to write all this crap everytime we use :: for a Suggested method. This also avoids attaching packages on the search path, only loading their namespaces instead.

Would be nice if we can get it to live in some core package, then I could use it for all my packages.

dtm2451 commented 3 years ago

Such a base function sounds pretty good to me! Would allow me to remove the 5 similar, though each manually made more specific, functions from dittoSeq.

Such a function could potentially also take in multiple pkgs for cases when 2 or more are actually needed for the specific action.

dtm2451 commented 3 years ago

Also, yes forgot about but totally meant *requireNamespace()!

mtmorgan commented 3 years ago

Probably a useful utility, although it sort of seems like one is patching an imperfect error message, with a better solution being a better error message?

One thing about the above is that it doesn't distinguish between types of errors (e.g., when a package fails to load because the installation has become corrupted somehow). One could be more clever, since the error is actually classed

> x = tryCatch(foo::bar(), error = identity)
> x
<packageNotFoundError in loadNamespace(x): there is no package called 'foo'>

So something like

tryCatch({
    foo:bar()
}, packageNotFoundError = function(e) {
    pkg <- e$package
    stop(
        "package '", pkg, "' not found; ",
        'install with `BiocManager::install("', pkg, '")`',
        call. = FALSE
    )
})

which also works for loadNamespace("foo") but not requireNamespace("foo").

Candidate locations are in BiocManager or maybe BiocGenerics; it's currently unusual for a package to Depend: or Import: BiocManager.

LTLA commented 3 years ago

BiocManager seems like the best place for this to live. The package has minimal dependencies and it must be installed by default before SingleR anyway, so I wouldn't consider it a real +1 to my dependency count.

kasperdanielhansen commented 3 years ago

But BiocManager is really for managing installations. Personally, I don't have BiocManager loaded when I do analysis, but I would want this fix to be available in that case.

Having a set of utility functions for dealing with Suggested packages seems worthwhile. I know this is suggesting something slightly different from what is being suggested here.

hpages commented 3 years ago

So basically the proposal is to replace the "there is no package called ‘scran’" error message with the more user-friendly "you don't have package 'scran'; install it with blah blah".

Personally I think that the specific error message suggested by @dtm2451 (scran package is required for de.method="wilcox" or "t") still has more value because it explains why the package is suddenly needed. It's always a little bit of an annoyance to discover that you miss a package in the middle of an analysis so it's nice to understand why this happens.

mtmorgan commented 3 years ago

Any thoughts @hpages on a home for this? I'm not sure, as Kasper notes, that BiocManager is the right place for it.

dtm2451 commented 3 years ago

I sometimes have tasks requiring multiple suggested packages, so would definitely vote for something which can check a set of packages. Perhaps the algorithm framework could be something like this:

suggested_pkgs_check <- function(pkgs, fxnality_message = "this functionality") {

    pkgs_missing <- vapply(
        pkgs, function(pkg) {
        # Martin's `tryCatch` suggestion modified to allow multiple packages,
        # OR a `requireNamespace` check
        # output: a logical for each pkg of whether it is missing (TRUE) vs available (FALSE) 
        }, FUN.VALUE = logical(1)
    )

    if (any(pkgs_missing)) {
        stop(
            "Package(s) ", paste0(pkgs[pkgs_missing], collapse = ", "),
            " unavailable, but required for ", fxnality_message,
            ". Install with `BiocManager::install(c('",
            paste0(pkgs[pkgs_missing], collapse = "', '"),
            "')`.",
            call. = FALSE
        )
    }
}

Then, 1) multiple packages could be checked (so user's don't install a single package, and start rerunning their pipeline only to get an error at the same point due to a different package needed for the same step!) && 2) my specific message suggestion can be accommodated (yet even if a developer doesn't bother to add there own custom fxnality_message here, the idea that new packages are needed for the specific, currently requested, functionality is still given!).

I wonder if we need to distinguish between reasons that a function may be inaccessible? The path forward if a package has become corrupted is still to reinstall, no?

Re the home for this function: I don't have anything to add that hasn't already been said.

hpages commented 3 years ago

It's about installing missing packages (and the error message explicitly instructs the user to use BiocManager to do so), which makes BiocManager kind of a natural place for it.