GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Error in addIterativeLSI() #383

Closed ajwilk closed 3 years ago

ajwilk commented 3 years ago

Thanks for a great package, running into an issue in addIterativeLSI(). I originally experienced this issue on ArchR_0.9.5 but it persists on the new release ArchR_1.0.0

sub <- addIterativeLSI(
    ArchRProj = sub,
    useMatrix = "TileMatrix", 
    name = "IterativeLSI", 
    iterations = 3, 
    clusterParams = list( #See Seurat::FindClusters
        resolution = c(0.2), 
        sampleCells = 1000, 
        n.start = 10
    ), 
    varFeatures = 25000, 
    dimsToUse = 1:30
)

Checking Inputs...
ArchR logging to : ArchRLogs/ArchR-addIterativeLSI-6f677b1d3c5f-Date-2020-10-31_Time-10-13-16.log
If there is an issue, please report to github with logFile!
2020-10-31 10:13:19 : Computing Total Across All Features, 0.005 mins elapsed.
2020-10-31 10:13:23 : Computing Top Features, 0.084 mins elapsed.
###########
2020-10-31 10:13:27 : Running LSI (1 of 3) on Top Features, 0.145 mins elapsed.
###########
2020-10-31 10:13:27 : Sampling Cells (N = 10001) for Estimated LSI, 0.146 mins elapsed.
2020-10-31 10:13:27 : Creating Sampled Partial Matrix, 0.147 mins elapsed.

************************************************************
2020-10-31 10:13:47 : ERROR Found in .LSIPartialMatrix for  
LogFile = ArchRLogs/ArchR-addIterativeLSI-6f677b1d3c5f-Date-2020-10-31_Time-10-13-16.log

<simpleError in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE): invalid character indexing>

************************************************************

Error in .logError(e, fn = ".LSIPartialMatrix", info = "", errorList = errorList, : Exiting See Error Above

To Reproduce Interestingly, this error does NOT reproduce with the hematopoiesis dataset. But luckily you can find the arrows that caused this on your Sherlock space:

/scratch/users/bparks/covid/04_data/arrow_files/filtered/ATAC-D1-56.arrow
/scratch/users/bparks/covid/04_data/arrow_files/filtered/ATAC-D1-57.arrow
/scratch/users/bparks/covid/04_data/arrow_files/filtered/ATAC-D2-58.arrow

Session Info

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] gtable_0.3.0                      harmony_1.0                      
 [3] Rcpp_1.0.2                        gridExtra_2.3                    
 [5] uwot_0.1.8                        nabor_0.5.0                      
 [7] Seurat_3.2.2                      ggridges_0.5.1                   
 [9] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.54.0                  
[11] rtracklayer_1.46.0                Biostrings_2.54.0                
[13] XVector_0.25.0                    ArchR_1.0.0                      
[15] magrittr_1.5                      rhdf5_2.29.6                     
[17] Matrix_1.2-17                     data.table_1.12.6                
[19] SummarizedExperiment_1.15.9       DelayedArray_0.11.8              
[21] BiocParallel_1.19.4               matrixStats_0.55.0               
[23] Biobase_2.45.1                    GenomicRanges_1.37.17            
[25] GenomeInfoDb_1.21.2               IRanges_2.19.17                  
[27] ggplot2_3.3.2                     S4Vectors_0.23.25                
[29] BiocGenerics_0.31.6              

loaded via a namespace (and not attached):
  [1] backports_1.1.5          plyr_1.8.4               igraph_1.2.4.1          
  [4] lazyeval_0.2.2           splines_3.6.1            listenv_0.7.0           
  [7] usethis_1.5.1            digest_0.6.22            htmltools_0.4.0         
 [10] gdata_2.18.0             memoise_1.1.0            tensor_1.5              
 [13] cluster_2.1.0            ROCR_1.0-7               remotes_2.1.0           
 [16] globals_0.12.4           prettyunits_1.0.2        colorspace_1.4-1        
 [19] ggrepel_0.8.1            xfun_0.10                dplyr_1.0.2             
 [22] callr_3.3.2              crayon_1.3.4             RCurl_1.95-4.12         
 [25] jsonlite_1.6             spatstat_1.64-1          spatstat.data_1.4-3     
 [28] survival_2.44-1.1        zoo_1.8-8                glue_1.4.2              
 [31] polyclip_1.10-0          zlibbioc_1.31.0          leiden_0.3.1            
 [34] pkgbuild_1.0.6           Rhdf5lib_1.7.6           future.apply_1.3.0      
 [37] abind_1.4-5              scales_1.0.0             miniUI_0.1.1.1          
 [40] viridisLite_0.3.0        xtable_1.8-4             reticulate_1.14         
 [43] rsvd_1.0.2               htmlwidgets_1.5.1        httr_1.4.1              
 [46] gplots_3.0.1.1           RColorBrewer_1.1-2       ellipsis_0.3.0          
 [49] ica_1.0-2                pkgconfig_2.0.3          XML_3.98-1.20           
 [52] deldir_0.1-29            tidyselect_1.1.0         labeling_0.3            
 [55] rlang_0.4.7              reshape2_1.4.3           later_1.0.0             
 [58] munsell_0.5.0            tools_3.6.1              cli_1.1.0               
 [61] generics_0.0.2           devtools_2.2.1           stringr_1.4.0           
 [64] fastmap_1.0.1            goftest_1.2-2            npsurv_0.4-0            
 [67] fs_1.3.1                 processx_3.4.1           knitr_1.25              
 [70] fitdistrplus_1.0-14      caTools_1.17.1.2         purrr_0.3.3             
 [73] RANN_2.6.1               pbapply_1.4-2            future_1.14.0           
 [76] nlme_3.1-141             mime_0.7                 compiler_3.6.1          
 [79] rstudioapi_0.10          curl_4.2                 plotly_4.9.0            
 [82] png_0.1-7                testthat_2.2.1           lsei_1.2-0              
 [85] spatstat.utils_1.17-0    tibble_3.0.3             stringi_1.4.3           
 [88] ps_1.3.0                 desc_1.2.0               RSpectra_0.15-0         
 [91] lattice_0.20-38          vctrs_0.3.2              pillar_1.4.6            
 [94] lifecycle_0.2.0          lmtest_0.9-37            RcppAnnoy_0.0.13        
 [97] cowplot_1.0.0            bitops_1.0-6             irlba_2.3.3             
[100] httpuv_1.5.2             patchwork_1.0.1          R6_2.4.0                
[103] promises_1.1.0           KernSmooth_2.23-16       sessioninfo_1.1.1       
[106] codetools_0.2-16         pkgload_1.0.2            MASS_7.3-51.4           
[109] gtools_3.8.1             assertthat_0.2.1         rprojroot_1.3-2         
[112] withr_2.1.2              GenomicAlignments_1.22.1 sctransform_0.3.1       
[115] Rsamtools_2.2.3          GenomeInfoDbData_1.2.1   mgcv_1.8-29             
[118] rpart_4.1-15             tidyr_1.0.0              Cairo_1.5-12.2          
[121] Rtsne_0.15               shiny_1.4.0  

Additional context ArchR log attached and traceback below. ArchR-addIterativeLSI-6f677b1d3c5f-Date-2020-10-31_Time-10-13-16.log

traceback()
8: stop("Exiting See Error Above")
7: .logError(e, fn = ".LSIPartialMatrix", info = "", errorList = errorList, 
       logFile = logFile)
6: value[[3L]](cond)
5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
4: tryCatchList(expr, classes, parentenv, handlers)
3: tryCatch({
       if (is.null(sampleCells)) {
           .logDiffTime("Creating Partial Matrix", tstart, addHeader = FALSE, 
               verbose = verbose, logFile = logFile)
           mat <- .getPartialMatrix(ArrowFiles = ArrowFiles, featureDF = featureDF, 
               useMatrix = useMatrix, cellNames = cellNames, doSampleCells = FALSE, 
               threads = threads, verbose = FALSE)
           .logDiffTime("Computing LSI", tstart, addHeader = FALSE, 
               verbose = verbose, logFile = logFile)
           outLSI <- .computeLSI(mat = mat, LSIMethod = LSIMethod, 
               scaleTo = scaleTo, nDimensions = max(dimsToUse), 
               binarize = binarize, outlierQuantiles = outlierQuantiles, 
               verbose = FALSE, seed = seed, tstart = tstart, logFile = logFile)
           outLSI$LSIFeatures <- featureDF
           outLSI$corToDepth <- list(scaled = abs(cor(.scaleDims(outLSI[[1]]), 
               cellDepth[rownames(outLSI[[1]])]))[, 1], none = abs(cor(outLSI[[1]], 
               cellDepth[rownames(outLSI[[1]])]))[, 1])
       }
       else {
           sampledCellNames <- .sampleBySample(cellNames = cellNames, 
               sampleNames = sampleNames, cellDepth = cellDepth, 
               sampleCells = sampleCells, outlierQuantiles = outlierQuantiles, 
               factor = 2)
           .logDiffTime(sprintf("Sampling Cells (N = %s) for Estimated LSI", 
               length(sampledCellNames)), tstart, addHeader = FALSE, 
               verbose = verbose, logFile = logFile)
           .logDiffTime("Creating Sampled Partial Matrix", tstart, 
               addHeader = FALSE, verbose = verbose, logFile = logFile)
           o <- h5closeAll()
           if (!projectAll) {
               mat <- .getPartialMatrix(ArrowFiles = ArrowFiles, 
                   featureDF = featureDF, useMatrix = useMatrix, 
                   cellNames = sampledCellNames, doSampleCells = FALSE, 
                   threads = threads, verbose = FALSE)
               .logDiffTime("Computing Estimated LSI (projectAll = FALSE)", 
                   tstart, addHeader = FALSE, verbose = verbose, 
                   logFile = logFile)
               outLSI <- .computeLSI(mat = mat, LSIMethod = LSIMethod, 
                   scaleTo = scaleTo, nDimensions = max(dimsToUse), 
                   binarize = binarize, outlierQuantiles = outlierQuantiles, 
                   seed = seed, tstart = tstart, logFile = logFile)
               outLSI$LSIFeatures <- featureDF
               outLSI$corToDepth <- list(scaled = abs(cor(.scaleDims(outLSI[[1]]), 
                   cellDepth[rownames(outLSI[[1]])]))[, 1], none = abs(cor(outLSI[[1]], 
                   cellDepth[rownames(outLSI[[1]])]))[, 1])
           }
           else {
               tmpPath <- .tempfile(pattern = "tmp-LSI-PM")
               .logDiffTime(sprintf("Sampling Cells (N = %s) for Estimated LSI", 
                   length(sampledCellNames)), tstart, addHeader = FALSE, 
                   verbose = verbose, logFile = logFile)
               out <- .getPartialMatrix(ArrowFiles = ArrowFiles, 
                   featureDF = featureDF, useMatrix = useMatrix, 
                   cellNames = cellNames, doSampleCells = TRUE, 
                   sampledCellNames = sampledCellNames, tmpPath = tmpPath, 
                   useIndex = useIndex, threads = threads, verbose = FALSE)
               gc()
               .logDiffTime("Computing Estimated LSI (projectAll = TRUE)", 
                   tstart, addHeader = FALSE, verbose = verbose, 
                   logFile = logFile)
               outLSI <- .computeLSI(mat = out$mat, LSIMethod = LSIMethod, 
                   scaleTo = scaleTo, nDimensions = max(dimsToUse), 
                   binarize = binarize, outlierQuantiles = outlierQuantiles, 
                   seed = seed, tstart = tstart, logFile = logFile)
               tmpMatFiles <- out[[2]]
               rm(out)
               gc()
               threads2 <- 1
               .logDiffTime("Projecting Matrices with LSI-Projection (Granja* et al 2019)", 
                   tstart, addHeader = FALSE, verbose = verbose, 
                   logFile = logFile)
               pLSI <- .safelapply(seq_along(tmpMatFiles), function(x) {
                   .logDiffTime(sprintf("Projecting Matrix (%s of %s) with LSI-Projection", 
                     x, length(tmpMatFiles)), tstart, addHeader = FALSE, 
                     verbose = FALSE, logFile = logFile)
                   .projectLSI(mat = readRDS(tmpMatFiles[x]), LSI = outLSI, 
                     verbose = FALSE, tstart = tstart, logFile = logFile)
               }, threads = threads2) %>% Reduce("rbind", .)
               rmf <- file.remove(tmpMatFiles)
               outLSI$exlcude <- cellNames[which(cellNames %ni% 
                   rownames(pLSI))]
               outLSI$matSVD <- as.matrix(pLSI[cellNames[which(cellNames %in% 
                   rownames(pLSI))], ])
           }
           outLSI$LSIFeatures <- featureDF
           outLSI$corToDepth <- list(scaled = abs(cor(.scaleDims(outLSI[[1]]), 
               cellDepth[rownames(outLSI[[1]])]))[, 1], none = abs(cor(outLSI[[1]], 
               cellDepth[rownames(outLSI[[1]])]))[, 1])
       }
       outLSI
   }, error = function(e) {
       errorList$outLSI <- if (exists("outLSI", inherits = FALSE)) 
           outLSI
       else "Error with outLSI!"
       errorList$matSVD <- if (exists("outLSI", inherits = FALSE)) 
           outLSI[[1]]
       else "Error with matSVD!"
       .logError(e, fn = ".LSIPartialMatrix", info = "", errorList = errorList, 
           logFile = logFile)
   })
2: .LSIPartialMatrix(ArrowFiles = ArrowFiles, featureDF = topFeatures, 
       cellNames = cellNames, cellDepth = cellDepth, useMatrix = useMatrix, 
       sampleNames = getCellColData(ArchRProj)$Sample, LSIMethod = LSIMethod, 
       scaleTo = scaleTo, dimsToUse = dimsToUse, binarize = binarize, 
       outlierQuantiles = outlierQuantiles, sampleCells = if (j != 
           iterations) sampleCellsPre else sampleCellsFinal, projectAll = j == 
           iterations | projectCellsPre | sampleJ > sampleCellsPre, 
       threads = threads, useIndex = FALSE, seed = seed, tstart = tstart, 
       verbose = verbose, logFile = logFile)
1: addIterativeLSI(ArchRProj = sub, useMatrix = "TileMatrix", name = "IterativeLSI", 
       iterations = 3, clusterParams = list(resolution = c(0.2), 
           sampleCells = 1000, n.start = 10), varFeatures = 25000, 
       dimsToUse = 1:30)
rcorces commented 3 years ago

@ajwilk - Sorry for not addressing this sooner. Does this error still persist? If you have not figured this out, let us know and we will try to address it. Otherwise, feel free to close this issue.

lsundaram commented 3 years ago

i get the same error as well

rcorces commented 3 years ago

@lsundaram - since this is a bit stale, can you provide an update on what you think is happening or a reproducible example?

lsundaram commented 3 years ago

Confirming that this goes away when the arrows are all reprocessed from the scratch in v1