GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 140 forks source link

createArrowFiles failed to load fragment files #1595

Closed BokaiZhu closed 2 years ago

BokaiZhu commented 2 years ago

Hi ArchR team, thanks for providing this wonderful tool for atac data!

Error encountered during function createArrowFiles

Error is specific to fragment files deposited GSM5866073 while other files worked fine in my setting, including the tutorial files.

Suscpected to be an error encountered previously, however this time I checked the unzip results of the fragment files and they look intact and all good. So I was not able to pin point the problem.

Error message:

> ArrowFiles_retina <- ArchR::createArrowFiles(
+   inputFiles = paths,
+   sampleNames = names(paths),
+   filterTSS = 4, #Dont set this too high because you can always increase later
+   filterFrags = 1000, 
+   addTileMat = TRUE,
+   minFrags = 500,
+   maxFrags = 1e+05,
+   addGeneScoreMat = TRUE,
+   nChunk = 2
+ )
filterFrags is no longer a valid input. Please use minFrags! Setting filterFrags value to minFrags!
filterTSS is no longer a valid input. Please use minTSS! Setting filterTSS value to minTSS!
Using GeneAnnotation set by addArchRGenome(Hg38)!
Using GeneAnnotation set by addArchRGenome(Hg38)!
ArchR logging to : ArchRLogs/ArchR-createArrows-1567e31d5b2-Date-2022-08-27_Time-18-00-23.log
If there is an issue, please report to github with logFile!
Cleaning Temporary Files
2022-08-27 18:00:25 : Batch Execution w/ safelapply!, 0 mins elapsed.
(AIIamacrine : 1 of 1) Determining Arrow Method to use!
Attempting to index /home/bkzhu/super_mario/atac_bench_nrz/retina/data/atac/GSM5866073_AIIamacrine_frags.tsv.gz as tabix..
createArrowFiles has encountered an error, checking if any ArrowFiles completed..
2022-08-27 18:00:25 : 
ArchR logging successful to : ArchRLogs/ArchR-createArrows-1567e31d5b2-Date-2022-08-27_Time-18-00-23.log

ArchR-createArrows-1567e31d5b2-Date-2022-08-27_Time-18-00-23.log

I still suspect it is more of an input file problem rather than a Archr bug, but some suggestions would be greatly appreciated.

Best, Bokai

rcorces commented 2 years ago

Hi @BokaiZhu! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

I would assume that whoever made the fragment files accidentally compressed them using gzip rather than bgzip. You should unzip them and re-compress them with bgzip. Feel free to comment again if this doesnt solve your problem. Closing the issue for now.

BokaiZhu commented 2 years ago

Thanks! Problem was solved by recompressing and reformatting the fragment files.

For future reference, the code used to solve:

gzip -d GSM5866073_Astrocyte_frags.tsv.gz &
sort -V -k1,1 -k2,2 GSM5866073_Astrocyte_frags.tsv > GSM5866073_Astrocyte_frags.sort.tsv &
bgzip GSM5866073_Astrocyte_frags.sort.tsv