GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

addGeneIntegrationMatrix fails with tutorial dataset and Seurat 5.0.0 #2052

Open chloeacolson opened 11 months ago

chloeacolson commented 11 months ago

Attach your log file ArchR-addGeneIntegrationMatrix-12d1d6685d430-Date-2023-11-11_Time-17-32-30.524157.log

Describe the bug In the ArchR tutorial section 8.1 running addGeneIntegrationMatrix fails, with the error:

Error in slot(object = object, name = "features")[[layer]] <- features : 
  more elements supplied than there are to replace

Running traceback() shows the following function stack:

13: `LayerData<-.Assay5`(object = `*tmp*`, layer = layer, features = features[[layer]], 
        cells = cells[[layer]], transpose = transpose, value = counts[[layer]])
12: `LayerData<-`(object = `*tmp*`, layer = layer, features = features[[layer]], 
        cells = cells[[layer]], transpose = transpose, value = counts[[layer]])
11: .CreateStdAssay.list(counts = counts, min.cells = min.cells, 
        min.features = min.features, transpose = transpose, type = type, 
        csum = csum, fsum = fsum, ...)
10: .CreateStdAssay(counts = counts, min.cells = min.cells, min.features = min.features, 
        transpose = transpose, type = type, csum = csum, fsum = fsum, 
        ...)
9: CreateAssay5Object(counts = counts, min.cells = min.cells, min.features = min.features, 
       ...)
8: CreateSeuratObject.default(counts = mat[head(seq_len(nrow(mat)), 
       5), , drop = FALSE])
7: Seurat::CreateSeuratObject(counts = mat[head(seq_len(nrow(mat)), 
       5), , drop = FALSE])
6: FUN(X[[i]], ...)
5: lapply(...)
4: .safelapply(seq_along(blockList), function(i) {
       prefix <- sprintf("Block (%s of %s) :", i, length(blockList))
       .logDiffTime(sprintf("%s Computing Integration", prefix), 
           tstart, verbose = verbose, logFile = logFile)
       blocki <- blockList[[i]]
       subProj@cellColData <- subProj@cellColData[blocki$ATAC, ]
       subProj@sampleColData <- subProj@sampleColData[unique(subProj$Sample), 
           , drop = FALSE]
       subRNA <- seuratRNA[, blocki$RNA]
       subRNA <- subRNA[rownames(subRNA) %in% geneDF$name, ]
       .logDiffTime(sprintf("%s Identifying Variable Genes", prefix), 
           tstart, verbose = verbose, logFile = logFile)
       subRNA <- FindVariableFeatures(object = subRNA, nfeatures = nGenes, 
           verbose = FALSE)
       subRNA <- ScaleData(object = subRNA, verbose = FALSE)
       if (is.null(genesUse)) {
           genesUse <- VariableFeatures(object = subRNA)
       }
       .logDiffTime(sprintf("%s Getting GeneScoreMatrix", prefix), 
           tstart, verbose = verbose, logFile = logFile)
    ...
3: Reduce("rbind", .)
2: .safelapply(seq_along(blockList), function(i) {
       prefix <- sprintf("Block (%s of %s) :", i, length(blockList))
       .logDiffTime(sprintf("%s Computing Integration", prefix), 
           tstart, verbose = verbose, logFile = logFile)
       blocki <- blockList[[i]]
       subProj@cellColData <- subProj@cellColData[blocki$ATAC, ]
       subProj@sampleColData <- subProj@sampleColData[unique(subProj$Sample), 
           , drop = FALSE]
       subRNA <- seuratRNA[, blocki$RNA]
       subRNA <- subRNA[rownames(subRNA) %in% geneDF$name, ]
       .logDiffTime(sprintf("%s Identifying Variable Genes", prefix), 
           tstart, verbose = verbose, logFile = logFile)
       subRNA <- FindVariableFeatures(object = subRNA, nfeatures = nGenes, 
           verbose = FALSE)
       subRNA <- ScaleData(object = subRNA, verbose = FALSE)
       if (is.null(genesUse)) {
           genesUse <- VariableFeatures(object = subRNA)
       }
       .logDiffTime(sprintf("%s Getting GeneScoreMatrix", prefix), 
           tstart, verbose = verbose, logFile = logFile)
    ...
1: addGeneIntegrationMatrix(ArchRProj = projHeme2, useMatrix = "GeneScoreMatrix", 
       matrixName = "GeneIntegrationMatrix", reducedDims = "IterativeLSI", 
       seRNA = seRNA, addToArrow = FALSE, groupRNA = "BioClassification", 
       nameCell = "predictedCell_Un", nameGroup = "predictedGroup_Un", 
       nameScore = "predictedScore_Un")

To Reproduce Run the ArchR tutorial steps until Section 8.1. Download the seRNA dataset as described, then run

projHeme2 <- addGeneIntegrationMatrix(
    ArchRProj = projHeme2, 
    useMatrix = "GeneScoreMatrix",
    matrixName = "GeneIntegrationMatrix",
    reducedDims = "IterativeLSI",
    seRNA = seRNA,
    addToArrow = FALSE,
    groupRNA = "BioClassification",
    nameCell = "predictedCell_Un",
    nameGroup = "predictedGroup_Un",
    nameScore = "predictedScore_Un"
)

which causes the error.

Expected behavior As described in the tutorial.

Session Info

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] presto_1.0.0                      harmony_1.1.0                     uwot_0.1.16                      
 [4] nabor_0.5.0                       Rsamtools_2.18.0                  BSgenome.Hsapiens.UCSC.hg19_1.4.3
 [7] BSgenome_1.70.1                   rtracklayer_1.62.0                BiocIO_1.12.0                    
[10] Biostrings_2.70.1                 XVector_0.42.0                    magick_2.8.1                     
[13] rhdf5_2.46.0                      SummarizedExperiment_1.32.0       Biobase_2.62.0                   
[16] MatrixGenerics_1.14.0             Rcpp_1.0.11                       Matrix_1.6-1.1                   
[19] GenomicRanges_1.54.1              GenomeInfoDb_1.38.0               IRanges_2.36.0                   
[22] S4Vectors_0.40.1                  BiocGenerics_0.48.1               matrixStats_1.1.0                
[25] data.table_1.14.8                 stringr_1.5.0                     plyr_1.8.9                       
[28] magrittr_2.0.3                    ggplot2_3.4.4                     gtable_0.3.4                     
[31] gtools_3.9.4                      gridExtra_2.3                     ArchR_1.0.2                      
[34] Seurat_5.0.0                      SeuratObject_5.0.0                sp_2.1-1                         

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.21         splines_4.3.2            later_1.3.1              bitops_1.0-7            
  [5] tibble_3.2.1             polyclip_1.10-6          XML_3.99-0.15            fastDummies_1.7.3       
  [9] lifecycle_1.0.4          globals_0.16.2           lattice_0.22-5           MASS_7.3-60             
 [13] plotly_4.10.3            yaml_2.3.7               httpuv_1.6.12            sctransform_0.4.1       
 [17] spam_2.10-0              spatstat.sparse_3.0-3    reticulate_1.34.0        cowplot_1.1.1           
 [21] pbapply_1.7-2            RColorBrewer_1.1-3       abind_1.4-5              zlibbioc_1.48.0         
 [25] Rtsne_0.16               purrr_1.0.2              RCurl_1.98-1.13          GenomeInfoDbData_1.2.11 
 [29] ggrepel_0.9.4            irlba_2.3.5.1            listenv_0.9.0            spatstat.utils_3.0-4    
 [33] pheatmap_1.0.12          goftest_1.2-3            RSpectra_0.16-1          spatstat.random_3.2-1   
 [37] fitdistrplus_1.1-11      parallelly_1.36.0        leiden_0.4.3             codetools_0.2-19        
 [41] DelayedArray_0.28.0      tidyselect_1.2.0         farver_2.1.1             spatstat.explore_3.2-5  
 [45] GenomicAlignments_1.38.0 jsonlite_1.8.7           ellipsis_0.3.2           progressr_0.14.0        
 [49] ggridges_0.5.4           survival_3.5-7           tools_4.3.2              ica_1.0-3               
 [53] glue_1.6.2               SparseArray_1.2.2        dplyr_1.1.3              withr_2.5.2             
 [57] fastmap_1.1.1            rhdf5filters_1.14.1      fansi_1.0.5              digest_0.6.33           
 [61] R6_2.5.1                 mime_0.12                colorspace_2.1-0         scattermore_1.2         
 [65] Cairo_1.6-1              tensor_1.5               spatstat.data_3.0-3      RhpcBLASctl_0.23-42     
 [69] utf8_1.2.4               tidyr_1.3.0              generics_0.1.3           httr_1.4.7              
 [73] htmlwidgets_1.6.2        S4Arrays_1.2.0           pkgconfig_2.0.3          lmtest_0.9-40           
 [77] htmltools_0.5.7          dotCall64_1.1-0          scales_1.2.1             png_0.1-8               
 [81] rstudioapi_0.15.0        reshape2_1.4.4           rjson_0.2.21             nlme_3.1-163            
 [85] zoo_1.8-12               KernSmooth_2.23-22       parallel_4.3.2           miniUI_0.1.1.1          
 [89] restfulr_0.0.15          pillar_1.9.0             vctrs_0.6.4              RANN_2.6.1              
 [93] promises_1.2.1           xtable_1.8-4             cluster_2.1.4            cli_3.6.1               
 [97] compiler_4.3.2           rlang_1.1.2              crayon_1.5.2             future.apply_1.11.0     
[101] labeling_0.4.3           stringi_1.7.12           viridisLite_0.4.2        deldir_1.0-9            
[105] BiocParallel_1.36.0      munsell_0.5.0            lazyeval_0.2.2           spatstat.geom_3.2-7     
[109] RcppHNSW_0.5.0           patchwork_1.1.3          future_1.33.0            Rhdf5lib_1.24.0         
[113] shiny_1.7.5.1            ROCR_1.0-11              igraph_1.5.1  

Additional context Note that I'm running this on an M1 Mac (as shown in the Session Info above). Any help with further debugging steps I can take would be much appreciated.

rcorces commented 11 months ago

Hi @chloeacolson! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
It is worth noting that there are very few actual bugs in ArchR. If you are getting an error, it is probably something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, we will not be able to help. Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for. If your post does not contain a reproducible example, it is unlikely to receive a response.
In addition to a reproducible example, you must do the following things before we help you, unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Did you post your log file? If not, add it now. 3.__ Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

chloeacolson commented 11 months ago

Seems to be related to this ArchR issue and this Seurat issue.