GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 141 forks source link

How can I fix an error in the creation of Arrow Files from my fragment files? #2163

Open Axbxh opened 6 months ago

Axbxh commented 6 months ago

ArchR log file

ArchR-createArrows-73c43a8d3823-Date-2024-05-20_Time-18-57-29.805596.log

Description of the bug

While creating Arrow Files from a fragment_file_name.tsv.gz, I get an error ggplot for Fragment Size Distribution. The message says the following:

2024-05-20 19:16:46.377978 : (D865 : 1 of 2) Successful creation of Arrow File, 19.246 mins elapsed. 2024-05-20 19:16:47.42894 : (D865 : 1 of 2) Adding Fragment Summary, 19.267 mins elapsed. 2024-05-20 19:17:08.62645 : (D865 : 1 of 2) Plotting Fragment Size Distribution, 19.621 mins elapsed. 2024-05-20 19:17:10.105093 : Continuing through after error ggplot for Fragment Size Distribution, 19.645 mins elapsed. 2024-05-20 19:17:11.227721 : (D865 : 1 of 2) Computing TSS Enrichment Scores, 19.664 mins elapsed. 2024-05-20 19:18:25.869288 : (D865 : 1 of 2) Computed TSS Scores!, 1.244 mins elapsed.

2024-05-20 19:18:25.885971 : Detected 2 or less cells pass filter (Non-Zero median TSS = 0.94, median Frags = 39590) in file! Check inputs such as 'filterFrags' or 'filterTSS' to keep cells! Exiting!

2024-05-20 19:18:25.893817 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..

------- Completed

End Time : 2024-05-20 19:18:26.010781 Elapsed Time Minutes = 20.9341327190399 Elapsed Time Hours = 0.348902928100692

Although the log message shown as "Successful creation of Arrow File", I do not find any Arrow files in my home directory. The output is three folders:

  1. ArchRLogs
  2. Fragment Size Distribution.pdf < SampleNames < QualityControl
  3. tmp which is empty

Code: To Reproduce

Code I used on Rstudio

library(ArchR)

fragmentFilePath <- '~/fragment_file_name.tsv.gz'

inputFiles <- c(fragmentFile = fragmentFilePath)
inputFiles

addArchRGenome("mm10")

work_dir <- "~/"
setwd(work_dir)

addArchRThreads(threads = 16) 

ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  minTSS = 2,
  minFrags = 0,
  maxFrags = 1e+07,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE,
  offsetPlus = 0,
  offsetMinus = 0,
  force = TRUE, #not make a new arrow file if one already exists
  TileMatParams = list(tileSize = 5000)
)

ArrowFiles

Expected behavior

Creation of Arrow File: fragment_file_name.arrow, in the ArchR directory.

ArchR Tutorial Code Link: https://www.archrproject.com/bookdown/creating-arrow-files.html

library(ArchR)

inputFiles <- getTutorialData("Hematopoiesis")
inputFiles

1756 ATAC_BMMC_R1 “HemeFragments/scATAC_BMMC_R1.fragments.tsv.gz” scATAC_CD34_BMMC_R1 “HemeFragments/scATAC_CD34_BMMC_R1.fragments.tsv.gz” scATAC_PBMC_R1 “HemeFragments/scATAC_PBMC_R1.fragments.tsv.gz”

addArchRGenome("hg19")
addArchRThreads(threads = 16) 

Setting default genome to Hg19. Setting default number of Parallel threads to 16.

ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  filterTSS = 4, #Dont set this too high because you can always increase later
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

Using GeneAnnotation set by addArchRGenome(Hg19)! Using GeneAnnotation set by addArchRGenome(Hg19)! ArchR logging to : ArchRLogs/ArchR-createArrows-dfa159ddbf6e-Date-2020-04-15_Time-09-21-27.log If there is an issue, please report to github with logFile! Cleaning Temporary Files 2020-04-15 09:21:28 : Batch Execution w/ safelapply!, 0 mins elapsed. ArchR logging successful to : ArchRLogs/ArchR-createArrows-dfa159ddbf6e-Date-2020-04-15_Time-09-21-27.log

ArrowFiles

“scATAC_BMMC_R1.arrow” “scATAC_CD34_BMMC_R1.arrow” “scATAC_PBMC_R1.arrow”

Additional context

Windows specifications of my device: Edition: Windows 10 Home Version: 22H2 Installed on: ‎1/‎22/‎2021 OS build: 19045.4412 Experience: Windows Feature Experience Pack 1000.19056.1000.0 R version R 4.3.3

rcorces commented 6 months ago

Hi @Axbxh! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know! In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15. Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

Axbxh commented 6 months ago

Hi @Axbxh! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know! In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15. Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

Hi, rcorces! Thank you for your response. I have installed R 4.1 but BiocManager 1.5 and ArchR package are unavailable for this version.

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("ArchR")
Installing package into ‘C:/Users/Abhira/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘ArchR’ is not available for this version of R
nigiord commented 4 months ago

Hi @Axbxh , this could be related to https://github.com/GreenleafLab/ArchR/issues/2150 . Even though createArrowFiles is supposed to continue after plotting fails, I think the error handling is wrongly implemented and that the output is not properly set.

https://github.com/GreenleafLab/ArchR/blob/d9e741c980c7c64e5348c97a74d146cc95f8ba76/R/CreateArrow.R#L213-L237

Currently all plots in ArchR break with the new versions of ggplot due to some functions like .fixPlotSize that convert the ggplot object to... something else for some reason.

You have to downgrade ggplot2 to 3.4.2 and it should work. Or modify yourself ArchR code to remove the plottings.

It’s also possible that another error occurred (for instance wrong type for a parameter that is not checked, like the parameters that are fed through TSSParams). You can’t know for sure because all error reportings were disabled like here or here. A strange choice, but if you manage to uncomment all those message/print in all tryCatch you might be able to understand what’s happening.

ArchR code is very convoluted with multiple intermediate functions dispatched in a lot of files, so unfortunately I haven’t found a way to fix the ggplot issue and send a pull request. On my side when I really need a specific version of ggplot (because of interactions with other single-cell analyses software for instance), I just remove all plotting in ArchR or execute it in its own outdated environment.

Cheers, −Nils