MarioniLab / DropletUtils

Clone of the Bioconductor repository for the DropletUtils package.
https://bioconductor.org/packages/devel/bioc/html/DropletUtils.html
56 stars 27 forks source link

hashedDrops() throwing "Extraneous illegal arguments" error #82

Closed jdrnevich closed 2 years ago

jdrnevich commented 2 years ago

I am trying to use hashedDrops() on a 10X CellPlex data set following the steps in the OSCA book here: http://bioconductor.org/books/3.14/OSCA.advanced/doublet-detection.html#doublet-detection-in-multiplexed-experiments. I can run the example data set just fine. However, when I use it on my CellPlex CMOs counts I get an error about "Extraneous illegal arguments". Here is a reproducible example using a public 10X dataset:

#Get 10X data set

download.file("https://cf.10xgenomics.com/samples/cell-exp/6.0.0/SC3_v3_NextGem_DI_CellPlex_CRISPR_A549_30K_Multiplex/SC3_v3_NextGem_DI_CellPlex_CRISPR_A549_30K_Multiplex_count_raw_feature_bc_matrix.tar.gz",
              destfile = "Downloads/SC3_v3_NextGem_DI_CellPlex_CRISPR_A549_30K_Multiplex_count_raw_feature_bc_matrix.tar.gz")

raw.path <- "Downloads/SC3_v3_NextGem_DI_CellPlex_CRISPR_A549_30K_Multiplex_count_raw_feature_bc_matrix.tar.gz"
out.path <- file.path(tempdir(), "10X")
untar(raw.path, exdir=out.path)

library(DropletUtils)
fname <- file.path(out.path, "raw_feature_bc_matrix/")
sce.pbmc <- read10xCounts(fname, col.names=TRUE)
sce.pbmc
# class: SingleCellExperiment 
# dim: 36706 4296940 
# metadata(1): Samples
# assays(1): counts
# rownames(36706): ENSG00000243485 ENSG00000237613 ... CMO311 CMO312
# rowData names(3): ID Symbol Type
# colnames(4296940): AAACCCAAGAAACACT-1 AAACCCAAGAAACCAT-1 ... TTTGTTGTCTTTGCTG-1 TTTGTTGTCTTTGGAG-1
# colData names(2): Sample Barcode
# reducedDimNames(0):
#   mainExpName: NULL
# altExpNames(0):

# Get just CMOs counts
sce.cmos <- sce.pbmc[grep("CMO", rownames(sce.pbmc)),]
sce.cmos
# class: SingleCellExperiment 
# dim: 12 4296940 
# metadata(1): Samples
# assays(1): counts
# rownames(12): CMO301 CMO302 ... CMO311 CMO312
# rowData names(3): ID Symbol Type
# colnames(4296940): AAACCCAAGAAACACT-1 AAACCCAAGAAACCAT-1 ... TTTGTTGTCTTTGCTG-1 TTTGTTGTCTTTGGAG-1
# colData names(2): Sample Barcode
# reducedDimNames(0):
#   mainExpName: NULL
# altExpNames(0):

#call cells using CMOs ----
set.seed(101)
hash.calls <- emptyDrops(counts(sce.cmos), by.rank=20000)
is.cell <- which(hash.calls$FDR <= 0.001)
length(is.cell)
# [1] 15532

hash.stats <- hashedDrops(counts(sce.cmos)[,is.cell],
                          ambient=metadata(hash.calls)$ambient)
# Error in x[!discard, , drop = FALSE] : 
#  nargs() = 4.  Extraneous illegal arguments inside '[ .. ]' (i.2col)?

traceback()
# 6: stop(domain = NA, gettextf("nargs() = %d.  Extraneous illegal arguments inside '[ .. ]' (i.2col)?", 
# nA))
# 5: x[!discard, , drop = FALSE]
# 4: x[!discard, , drop = FALSE]
# 3: .local(x, ...)
# 2: hashedDrops(counts(sce.cmos)[, is.cell], ambient = metadata(hash.calls)$ambient)
# 1: hashedDrops(counts(sce.cmos)[, is.cell], ambient = metadata(hash.calls)$ambient)

# R version 4.1.2 (2021-11-01)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 19042)
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
# [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.1252    
# 
# attached base packages:
#   [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] DropletUtils_1.14.1         SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
# [4] Biobase_2.54.0              GenomicRanges_1.46.1        GenomeInfoDb_1.30.0        
# [7] IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        
# [10] MatrixGenerics_1.6.0        matrixStats_0.61.0         
# 
# loaded via a namespace (and not attached):
#   [1] Rcpp_1.0.7                edgeR_3.36.0              XVector_0.34.0            zlibbioc_1.40.0          
# [5] BiocParallel_1.28.3       lattice_0.20-45           tools_4.1.2               DelayedMatrixStats_1.16.0
# [9] sparseMatrixStats_1.6.0   parallel_4.1.2            grid_4.1.2                scuttle_1.4.0            
# [13] rhdf5_2.38.0              dqrng_0.3.0               R.oo_1.24.0               HDF5Array_1.22.1         
# [17] Matrix_1.4-0              GenomeInfoDbData_1.2.7    Rhdf5lib_1.16.0           R.utils_2.11.0           
# [21] rhdf5filters_1.6.0        bitops_1.0-7              RCurl_1.98-1.5            limma_3.50.0             
# [25] DelayedArray_0.20.0       compiler_4.1.2            R.methodsS3_1.8.1         locfit_1.5-9.4           
# [29] beachmat_2.10.0

I am not sure what is causing this or what else I should look at to find out. Thanks!

LTLA commented 2 years ago

I'm guessing that this is the cause:

library(Matrix)
mat <- rsparsematrix(10, 10, 0.1)
mat[c(1,2,3),,drop=FALSE] # ok
mat[rbind(1,2,3),,drop=FALSE] # error

For some reason ambient is an array, so the simple workaround is to just coerce it into a numeric vector until 1.5.3 pops.

@jonathangriffiths I pushed the changes to BioC but I don't have the latest BioC packages installed on my current machine. I assume it'll all be fine but it may be worth setting up some GitHub Actions to do a build + check before push. There should be a pretty nice Action template from Leo that caches the packages after initial install.

jonathangriffiths commented 2 years ago

Thanks for the tweak Aaron.

I think I've found those docs from Leo so I'll look closer at getting it set up here.

jdrnevich commented 2 years ago

This did work Aaron. Thanks for the workaround.

hash.stats <- hashedDrops(counts(sce.cmos)[,is.cell],
                          ambient=as.numeric(metadata(hash.calls)$ambient))