GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Rstudio crashing with addGeneIntegrationMatrix() and addToArrow = TRUE when accessing GeneIntegrationMatrix from temp files #2096

Open dgagler opened 9 months ago

dgagler commented 9 months ago

Attach your log file ArchR-addGeneIntegrationMatrix-309a7ba7d4af-Date-2024-01-11_Time-01-44-22.log

Describe the bug RStudio aborts and hits a fatal error when running addGeneIntegrationMatrix() when ArchR is writing the GeneIntegrationMatrix to the arrow files. I can run the integration successfully when addToArrows = FALSE. I'm working with large datasets...the ArchR object has 407k cells and the Seurat object I am integrating has 284k cells. As far as I know, fatal errors like this are typically related to memory. I'm running on a 2020 M1 MacBook Pro with 16Gb RAM.

Things I've tried: making sure I have space on my laptop (I have about 120Gb free), setting threads = 1, increasing the max global size in Rstudio, lowering sampleCellsATAC and sampleCellsRNA to decrease batch sizes, and downsampling the Seurat object to 125k cells.

Looking at the log file, the integration itself seems ok (altho it takes 5+ hours) but the crash occurs as soon as ArchR attempts to access the GeneIntegrationMatrix from the temp files, which are admittedly very large (about 45Gb total, each one is ~800Mb). The obvious solution here is to use my institution's HPC, but it looks like others have tried that and have issues writing arrowFiles #1218. Are there still plans to implement HPC compatability with ArchR? Any ideas about how to slide this integration through locally on my end?

To Reproduce

seurat.down <- seurat[, sample(colnames(seurat), size = 100000, replace=F)]

archR <- addGeneIntegrationMatrix(
  ArchRProj = archR,
  useMatrix = "GeneScoreMatrix",
  matrixName = "GeneIntegrationMatrix",
  reducedDims = "Harmony_Batch",
  seRNA = seurat.down,
  addToArrow = TRUE,
  sampleCellsATAC = 7500,
  sampleCellsRNA = 7500,
  threads = 1,
  groupRNA = "majority_voting",
  nameCell = "predictedCell_Un",
  nameGroup = "predictedGroup_Un",
  nameScore = "predictedScore_Un",
  verbose = TRUE
)

Screenshots Screenshot 2024-01-11 at 7 41 43 AM

rcorces commented 9 months ago

Hi @dgagler! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
It is worth noting that there are very few actual bugs in ArchR. If you are getting an error, it is probably something specific to your dataset, usage, or computational environment, all of which are extremely challenging to troubleshoot. As such, we require reproducible examples (preferably using the tutorial dataset) from users who want assistance. If you cannot reproduce your error, we will not be able to help. Before going through the work of making a reproducible example, search the previous Issues, Discussions, function definitions, or the ArchR manual and you will likely find the answers you are looking for. If your post does not contain a reproducible example, it is unlikely to receive a response.
In addition to a reproducible example, you must do the following things before we help you, unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Did you post your log file? If not, add it now. 3.__ Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.