GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 140 forks source link

Variables in createArrowFiles (and other functions) e.g. minTSS not being used #1598

Closed mfrenkel16 closed 2 years ago

mfrenkel16 commented 2 years ago

Attach your log file ArchR-createArrows-13213a80cdf8-Date-2022-08-30_Time-17-18-48.log ArchR-createArrows-1321264c9a45-Date-2022-08-30_Time-17-57-54.log

Describe the bug The variables minTSS and minFrags within createArrowFiles do not seem to be filtering the data based on the values specified. Regardless of what values are given to this function, the output is always the same (total) number of cells without any filtering (i.e. 10600 cells in the Hematopoesis tutorial).

To Reproduce

library(ArchR)
set.seed(1)
addArchRThreads(threads = 1)
addArchRGenome("hg19")
inputFiles <- getTutorialData("Hematopoiesis")

#First filteration case
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  filterTSS = 4, 
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

doubScores <- addDoubletScores(
  input = ArrowFiles,
  k = 10, 
  knnMethod = "UMAP", 
  LSIMethod = 1
)

proj1 <- ArchRProject(
  ArrowFiles = ArrowFiles, 
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE 
)

#Second filtration case

ArrowFiles2 <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  filterTSS = 15, 
  filterFrags = 5000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

doubScores2 <- addDoubletScores(
  input = ArrowFiles2,
  k = 10, 
  knnMethod = "UMAP", 
  LSIMethod = 1
)

proj2 <- ArchRProject(
  ArrowFiles = ArrowFiles2, 
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE 
)

Expected behavior In the first case with minTSS = 4 and minFrags = 1000, the total number of cells passing filter should have been 10660 (which it is). In the second case with minTSS = 15 and minFrags = 5000, the total number of cells passing filter should have been 1141 [based on length(which(proj1$TSSEnrichment >= 4 & proj1$nFrags >=2000))] (however, it still produced the original number of cells, 10660, without filtering).

Additional context A similar problem seems to happen where other ArchR functions do not use the variables I think they should. For example, changing the perplexity value within addTSNE() doesn't change the resulting graph for me.

rcorces commented 2 years ago

Hi @mfrenkel16! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

Thanks for posting a reproducible example and for following the issue template. If you're running the code exactly as posted above, then the ArrowFiles2 command isnt updating your Arrow Files because you havent used force = TRUE. You console output should look something like this:

> ArrowFiles2 <- createArrowFiles(
+   inputFiles = inputFiles,
+   sampleNames = names(inputFiles),
+   filterTSS = 15, 
+   filterFrags = 5000, 
+   addTileMat = TRUE,
+   addGeneScoreMat = TRUE
+ )
filterFrags is no longer a valid input. Please use minFrags! Setting filterFrags value to minFrags!
filterTSS is no longer a valid input. Please use minTSS! Setting filterTSS value to minTSS!
Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-ffb152d001969-Date-2022-08-30_Time-21-04-10.log
If there is an issue, please report to github with logFile!
subThreadhing Enabled since ArchRLocking is FALSE see `addArchRLocking`
2022-08-30 21:04:10 : Batch Execution w/ safelapply!, 0 mins elapsed.
2022-08-30 21:04:10 : (scATAC_BMMC_R1 : 1 of 3) Arrow Exists! Marking as completed since force = FALSE!, 0 mins elapsed.
2022-08-30 21:04:10 : (scATAC_CD34_BMMC_R1 : 2 of 3) Arrow Exists! Marking as completed since force = FALSE!, 0 mins elapsed.
2022-08-30 21:04:10 : (scATAC_PBMC_R1 : 3 of 3) Arrow Exists! Marking as completed since force = FALSE!, 0 mins elapsed.
ArchR logging successful to : ArchRLogs/ArchR-createArrows-ffb152d001969-Date-2022-08-30_Time-21-04-10.log

if you set force = TRUE it should work fine.

Closing for now but feel free to post again here if you feel this hasnt addressed your question.