GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 140 forks source link

Saving subset of ArchRProject fails with all(file.exists(zfiles)) is not TRUE #1504

Closed boconnell89 closed 2 years ago

boconnell89 commented 2 years ago

Describe the bug Copying a subset of an ArchR Project with multiple samples fails at the end stages of saveArchRProject.

Note: I don't get this error if I set dropCells = TRUE, but then I get the error described in #663, despite updating to the most recent version of ArchR (1.0.2).

To Reproduce No. I can subset the tutorial dataset, and also drop cells just fine.

Expected behavior I expected to be able to save the subset. I was able to do this for each of the libraries when in individual ArchR projects, as long as dropCells=F

Screenshots

> ctype<-c("GABAergic")
> idxSample <- BiocGenerics::which(CombinedData@cellColData$predictedGroup_Un %in% ctype)
> cellsSample <- CombinedData$cellNames[idxSample]
> subSet<-CombinedData[cellsSample, ]
> subSet<-saveArchRProject(subSet, dropCells=F, outputDirectory="../ArchRProjectsForPublication/CombinedData.GABAergic.subset")
Copying ArchRProject to new outputDirectory : /ArchRProjectsForPublication/CombinedData.GABAergic.subset
Copying Arrow Files...
Copying Arrow Files (1 of 3)
Copying Arrow Files (2 of 3)
Copying Arrow Files (3 of 3)
Getting ImputeWeights
No imputeWeights found, returning NULL
Copying Other Files...
Copying Other Files (1 of 15): ArchRLogs
Copying Other Files (2 of 15): CombinedData_additional_analysis.R
Copying Other Files (3 of 15): Embeddings
Copying Other Files (4 of 15): GroupCoverages
Copying Other Files (5 of 15): IterativeLSI
Copying Other Files (6 of 15): IterativeLSI_2M
Copying Other Files (7 of 15): IterativeLSI_v3
Copying Other Files (8 of 15): IterativeLSI_v4
Copying Other Files (9 of 15): PeakCalls
Copying Other Files (10 of 15): Plots
Copying Other Files (11 of 15): QualityControl
Copying Other Files (12 of 15): RNAIntegration
Copying Other Files (13 of 15): Rplots.pdf
Copying Other Files (14 of 15): saveArchRPRoject.log
Copying Other Files (15 of 15): tmp
Error in saveArchRProject(subSet, dropCells = F, logFile = "./saveArchRPRoject.log",  :
  all(file.exists(zfiles)) is not TRUE

Session Info

R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /home/groups/oroaklab/src/R/R-4.0.4/lib/libRblas.so
LAPACK: /home/groups/oroaklab/src/R/R-4.0.4/lib/libRlapack.so

Random number generation:
 RNG:     L'Ecuyer-CMRG
 Normal:  Inversion
 Sample:  Rejection

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] rhdf5_2.34.0                SummarizedExperiment_1.20.0
 [3] Biobase_2.50.0              MatrixGenerics_1.2.1
 [5] Rcpp_1.0.8.3                Matrix_1.4-0
 [7] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7
 [9] IRanges_2.24.1              S4Vectors_0.28.1
[11] BiocGenerics_0.36.1         matrixStats_0.62.0
[13] data.table_1.14.2           stringr_1.4.0
[15] plyr_1.8.7                  magrittr_2.0.3
[17] ggplot2_3.3.6               gtable_0.3.0
[19] gtools_3.9.2.2              gridExtra_2.3
[21] ArchR_1.0.2

loaded via a namespace (and not attached):
 [1] compiler_4.0.4         pillar_1.7.0           XVector_0.30.0
 [4] rhdf5filters_1.2.1     bitops_1.0-7           tools_4.0.4
 [7] zlibbioc_1.36.0        lattice_0.20-45        lifecycle_1.0.1
[10] tibble_3.1.7           pkgconfig_2.0.3        rlang_1.0.3
[13] DelayedArray_0.16.3    DBI_1.1.3              cli_3.3.0
[16] GenomeInfoDbData_1.2.4 withr_2.5.0            dplyr_1.0.9
[19] generics_0.1.3         vctrs_0.4.1            tidyselect_1.1.2
[22] glue_1.6.2             R6_2.5.1               fansi_1.0.3
[25] Rhdf5lib_1.12.1        purrr_0.3.4            scales_1.2.0
[28] ellipsis_0.3.2         assertthat_0.2.1       colorspace_2.0-3
[31] utf8_1.2.2             stringi_1.7.6          RCurl_1.98-1.7
[34] munsell_0.5.0          crayon_1.5.1           Cairo_1.6-0

Additional context Add any other context about the problem here.

rcorces commented 2 years ago

Hi @boconnell89! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

@boconnell89 - thanks for posting. If you think this is the same issue as #663, then lets keep the discussion over there rather than opening a new issue.

Also, that patch has not yet been incorporated into release_1.0.2 so you would need to specifically install the dev_emptyChr branch to see if that fix helps you. I'm going to close this issue for now. I would recommend using the command below to install the dev_emptyChr branch and following up at #663 if you think you are encountering the same issue. If you think you are encountering a different issue, just post here and I will re-open.

devtools::install_github("GreenleafLab/ArchR", ref="dev_emptyChr", repos = BiocManager::repositories())
rcorces commented 2 years ago

oh. And make sure after running that install command that you detach and re-load ArchR:

detach("package:ArchR", unload=TRUE)
library(ArchR)
boconnell89 commented 2 years ago

This is a different issue than #663. I will try installing the dev branch.

rcorces commented 2 years ago

closing due to inactivity. feel free to comment again here if your issue persits.

markphillippebworth commented 2 years ago

FYI, I'm hitting this same issue with 1.0.2 and the dev branch. It may be related to subseting the ArchR project in a way that excludes complete arrow files? I don't know. I'm trying again with dropCells = FALSE to get around the issue.

traceback() 4: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p))) 3: stopifnot(all(file.exists(zfiles))) 2: saveArchRProject(ArchRProj = ArchRProj[cells, ], outputDirectory = outputDirectory, load = TRUE, dropCells = dropCells, logFile = logFile, threads = threads) 1: subsetArchRProject(Monos, outputDirectory = "CD16s_EarlyInfectionComp", cells = cellNames, force = TRUE)

markphillippebworth commented 2 years ago

And I'm having it fail with the same error, even with dropCells = FALSE on the dev branch.

I installed the 'dev_emptyChr', and ran it with dropCells = FALSE, and it worked.

rcorces commented 2 years ago

@markphillippebworth - I'm glad the dev_emptyChr branch worked. Though I would have expected all of those changes to have been transferred to dev. If you're feeling generous, could you install the dev_saveSubset branch and run it again to see if it tells you which files arent getting successfully copied? That might help me pin down the error.

markphillippebworth commented 2 years ago

Hey @rcorces , thanks for working on this. I installed dev_saveSubset and re-ran my code. I hit errors though.

Copying Other Files... Copying Other Files (1 of 9): Annotations Copying Other Files (2 of 9): Background-Peaks.rds Copying Other Files (3 of 9): Embeddings Copying Other Files (4 of 9): GroupCoverages Copying Other Files (5 of 9): IterativeLSI Copying Other Files (6 of 9): PeakCalls Copying Other Files (7 of 9): Plots Copying Other Files (8 of 9): QuerySeurat.RDS Copying Other Files (9 of 9): RNAIntegration Error in saveArchRProject(ArchRProj = ArchRProj[cells, ], outputDirectory = outputDirectory, : Coverage files missing from new project's directory. Missing files: /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3002Negative..Rep1.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3002Negative..Rep2.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3003Negative..Rep1.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3003Negative..Rep2.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3007Negative..Rep1.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3007Negative..Rep2.insertions.coverage.h5 /home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3008_Neg

Traceback() 3: stop("Coverage files missing from new project's directory. Missing files:\n", paste(zfiles[which(!file.exists(zfiles))], collapse = "\n")) 2: saveArchRProject(ArchRProj = ArchRProj[cells, ], outputDirectory = outputDirectory, load = TRUE, dropCells = dropCells, logFile = logFile, threads = threads) 1: subsetArchRProject(Monos, outputDirectory = "CD16s_EarlyInfectionComp2", cells = cellNames, dropCells = FALSE, force = TRUE)

markphillippebworth commented 2 years ago

This is an ArchR project that I've been working on for almost two years, so I'm guessing I've run and re-run peak calling. Perhaps there's a referencing issue?

I was able to confirm that those "missing files" are still present in the new ArchR project, and they're generally around 11-63Mb in size, so they are not actually missing, nor are they empty.

markphillippebworth commented 2 years ago

FYI - this is after I cleaned up the original project, because some of the added matrices had been corrupted/were missing from a few arrow files. If you're interested, I can share my function for that - I spliced together a dietArchR function from your internal code base to correct the issue.

Given the fact there were other reference issues, I'm guessing there's referencing issues for the misc data as well? I'm not familiar with how ArchR handles that side of subsetting projects.

rcorces commented 2 years ago

@markphillippebworth - I havent quite been able to figure this one out. In particular, I'm not sure why the dev_emptyChr branch would work for you.

In your above post, you said that the files that ArchR claims are missing actually do exist at these paths?

/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3002_Negative..Rep1.insertions.coverage.h5
/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3002_Negative..Rep2.insertions.coverage.h5
/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3003_Negative..Rep1.insertions.coverage.h5
/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3003_Negative..Rep2.insertions.coverage.h5
/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3007_Negative..Rep1.insertions.coverage.h5
/home/markphillip_pebworth/otherDrive2/CovidAll/MonoDC/GroupCoverages/LongGroups/CD14_Mono_FH3007_Negative..Rep2.insertions.coverage.h5

I'm not sure why that would be the case. Could it be a permissions issue?

markphillippebworth commented 2 years ago

That's correct - but the crazy thing is that they are present both in the original project, and the new project folder, so it's not a permissions issue. ArchR has no problem copying them over, but still flags them as missing for some reason.