I am using my dataset for ArchR analyses. These are scATAC seq dataset. I am able to run through the initial steps of creation of arrow files but it seems to be stuck at "Creating ArrowFile From Temporary File 9.113 mins elapsed" for over two hours. And then once it finishes, it gives an empty character (0) for ArrowFiles.

> inputFiles = getInputFiles("/home/newfolder/outs/")
> inputFiles
> addArchRGenome("hg38")
Setting default genome to Hg38.
> addArchRThreads(threads = 16) 
Setting default number of Parallel threads to 16.
>  ArrowFiles <- createArrowFiles(inputFiles = inputFiles,sampleNames = names(inputFiles),minTSS = 4,minFrags = 1000,addTileMat = TRUE,addGeneScoreMat = TRUE)
Using GeneAnnotation set by addArchRGenome(Hg38)!
Using GeneAnnotation set by addArchRGenome(Hg38)!
ArchR logging to : ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log
If there is an issue, please report to github with logFile!
2022-09-28 11:16:36 : Batch Execution w/ safelapply!, 0 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Arrow Exists! Overriding since not completed!, 0.001 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Reading In Fragments from inputFiles (readMethod = bam), 0.001 mins elapsed.
2022-09-28 11:16:36 : (possorted : 1 of 1) Tabix Bam To Temporary File, 0.002 mins elapsed.
2022-09-28 11:19:31 : (possorted : 1 of 1) Reading BamFile 28 Percent, 2.92 mins elapsed.
2022-09-28 11:22:19 : (possorted : 1 of 1) Reading BamFile 55 Percent, 5.722 mins elapsed.
2022-09-28 11:24:07 : (possorted : 1 of 1) Reading BamFile 82 Percent, 7.523 mins elapsed.
2022-09-28 11:25:43 : (possorted : 1 of 1) Successful creation of Temporary File, 9.113 mins elapsed.
2022-09-28 11:25:43 : (possorted : 1 of 1) Creating ArrowFile From Temporary File, 9.113 mins elapsed.
2022-09-28 12:52:52 : ERROR Found in .tmpToArrow for (possorted : 1 of 1) 
LogFile = ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log

<simpleError: Detected 2 or less cells (0 barcodes have greater than 50 fragments) in file!
       Check inputs such as 'minFrags' or 'maxFrags' to keep cells!
       Also check that you are using the correct reference genome.

2022-09-28 12:52:52 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..
ArchR logging successful to : ArchRLogs/ArchR-createArrows-3e077720032e-Date-2022-09-28_Time-11-16-36.log
> ArrowFiles
> traceback()
3: gc()
2: closeAllConnections()
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/.conda/envs/scenic/lib/

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] Rsamtools_2.10.0                  BSgenome.Hsapiens.UCSC.hg38_1.4.4
 [3] BSgenome_1.62.0                   rtracklayer_1.54.0               
 [5] Biostrings_2.62.0                 XVector_0.34.0                   
 [7] rhdf5_2.38.1                      SummarizedExperiment_1.24.0      
 [9] Biobase_2.54.0                    MatrixGenerics_1.6.0             
[11] Rcpp_1.0.9                        Matrix_1.5-1                     
[13] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
[15] IRanges_2.28.0                    S4Vectors_0.32.4                 
[17] BiocGenerics_0.40.0               matrixStats_0.62.0               
[19] data.table_1.14.2                 stringr_1.4.1                    
[21] plyr_1.8.7                        magrittr_2.0.3                   
[23] ggplot2_3.3.6                     gtable_0.3.1                     
[25] gtools_3.9.3                      gridExtra_2.3                    
[27] ArchR_1.0.2                      

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.2         purrr_0.3.4              lattice_0.20-45         
 [4] colorspace_2.0-3         vctrs_0.4.1              generics_0.1.3          
 [7] yaml_2.3.5               XML_3.99-0.10            utf8_1.2.2              
[10] rlang_1.0.6              pillar_1.8.1             glue_1.6.2              
[13] withr_2.5.0              DBI_1.1.3                BiocParallel_1.28.3     
[16] GenomeInfoDbData_1.2.7   lifecycle_1.0.2          zlibbioc_1.40.0         
[19] munsell_0.5.0            restfulr_0.0.15          Cairo_1.6-0             
[22] fansi_1.0.3              scales_1.2.1             DelayedArray_0.20.0     
[25] rjson_0.2.21             stringi_1.7.8            dplyr_1.0.10            
[28] BiocIO_1.4.0             cli_3.4.1                tools_4.1.2             
[31] bitops_1.0-7             rhdf5filters_1.6.0       RCurl_1.98-1.8          
[34] tibble_3.1.8             crayon_1.5.1             pkgconfig_2.0.3         
[37] assertthat_0.2.1         Rhdf5lib_1.16.0          R6_2.5.1                
[40] GenomicAlignments_1.30.0 compiler_4.1.2


rcorces commented 2 years ago

rcorces commented 2 years ago

You are using BAM files as input which require additional specification. Your error indicates that ArchR cannot find the information corresponding to the cell barcodes. Please see the BAM-related parameter documentation for createArrowFiles().

<simpleError: Detected 2 or less cells (0 barcodes have greater than 50 fragments) in file!
       Check inputs such as 'minFrags' or 'maxFrags' to keep cells!
       Also check that you are using the correct reference genome.
yojetsharma commented 2 years ago

Worked! Thank you!