GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
373 stars 133 forks source link

createArrowFiles failed to load fragment files #1289

Closed BokaiZhu closed 2 years ago

BokaiZhu commented 2 years ago

Hi ArchR team, thanks for providing this wonderful tool for atac data!

I have run through the tutorial, and was able to read in the fragment files via function createArrowFiles with the tutorial fragments. However, when I was trying to read in some other fragment files (the ones from "Integrated single-cell transcriptomics and epigenomics reveals strong germinal center–associated etiology of autoimmune risk loci" under GSE165860, the reading failed where nothing was loaded.

The code used here (where I tried loading with Hg38 as specified in the paper):

library(ArchR)
set.seed(42)
addArchRThreads(threads = 16) 
addArchRGenome("hg38")
a1 = "human_tonsil/ATAC/1a/GSM5051499_Tonsil_1a_scATC_seq_fragments.tsv.gz"
b1 = "human_tonsil/ATAC/1b/GSM5051500_Tonsil_1b_scATC_seq_fragments.tsv.gz"
paths = c(a1, b1)
names(paths) = c("a1","a2")
ArrowFiles1 <- createArrowFiles(
  inputFiles = paths,
  sampleNames = names(paths),
  filterTSS = 6, #Dont set this too high because you can always increase later
  filterFrags = 1000, 
  addTileMat = TRUE,
  minFrags = 500,
  maxFrags = 1e+05,
  addGeneScoreMat = TRUE
)

The message I got:

Screen Shot 2022-02-10 at 11 36 26 PM

and the ArrowFiles1 was empty, unlike when reading the files provided in the tutorial.

I also made couple tests on the files under the same GSE deposition series, seems I wasn't able to read any of them, so likely not a truncated download problem.

I also tried fragment files from the other ArchR paper (nature genetics, GSE162690), and the reading was good (using hg19 in that case).

I also checked other fragment files aligned with hg38 in another paper: GSE139369. Files read in without any problem.

I checked changing the thread to 1, or chaning nchunk to 1, and the problem still persists.

the log file is as: ArchR-createArrows-94133f6837ed-Date-2022-02-10_Time-23-46-13.log

Please let me know what part I got wrong and thanks for the help in advance! Withe these testing I'm thinking if the fragment files under GSE165860 need different parameters?

rcorces commented 2 years ago

Hi @BokaiZhu! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now.

BokaiZhu commented 2 years ago

For the questions:

  1. Yes searched and tried solutions mentioned in other threads, did not work.
  2. This problem is only specific to fragment files under GSE165860 (Science Immunology paper from your lab).
  3. Posted the log file.
rcorces commented 2 years ago

The file in question looks corrupted in some way. When I download that file and try to unzip it or use zcat to read the lines, it does not contain a standard text file. I would contact the authors of that paper if you see the same thing and need further assistance with their files.

image

BokaiZhu commented 2 years ago

Thanks! Contacted the authors.