GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Error in createArrowFiles() #436

Closed jbjela closed 3 years ago

jbjela commented 3 years ago

Attach your log file ArchR has a built-in logging functionality for all complex functions. You MUST attach your log file (indicated in the console output) to this issue. Just drag and drop it here.

ArchR-createArrows-1ea85499f3cd-Date-2020-11-21_Time-18-07-44.log

Describe the bug I am attempting to create an arrowfile from a fragment.tsv.gz file from the 10X website, but when I attempt to run createArrowFiles() using that fragment as the input file I receive the following error:

<simpleError: No fragments found!

The fragment file has fragments listed in the following format, which may be the cause of the error:

Screen Shot 2020-11-21 at 6 20 31 PM

To Reproduce This issue does not occur with the Hematopoiesis dataset.

Here is the code I ran:

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories()) library(ArchR)

inputFiles <-c('/atac_v1_hgmm_10k_fragments.tsv.gz') addArchRThreads(threads = 16) addArchRGenome("mm10") inputFiles ArrowFiles <- createArrowFiles( inputFiles = inputFiles, sampleNames = "1k PBMC", minTSS = 3, #Dont set this too high because you can always increase later minFrags = 1000, addTileMat = TRUE, addGeneScoreMat = TRUE )

Expected behavior I expected the program to output an arrow file corresponding to my fragment input

Screenshots

Screen Shot 2020-11-21 at 6 14 58 PM

Additional context I have read the full ArchR manual and utilized ArchR to analyze several other samples in the past, but I am confused as to why this error is occuring with this sample specifically. I have also read the other Gihub Issues that cite this same error in arrow file creation, and none of those solutions have resolved the issue.

I have additionally attempted to run reformatFragmentFiles() on my fragment but I receive the following error:

Error in reformatFragmentFiles(inputFiles) : No fragments found after checking for integers and chrPrefix!

Any assistance would be very appreciated!

rcorces commented 3 years ago

You fragments have chromosomes called "hg19chr1". This is not an acceptable input because your reference genome has chromosomes called "chr1". Because of this, ArchR finds no fragments that match based on chromosome. If you strip the "hg19" off, it will work.

Please close this issue if this resolves your problem.

jbjela commented 3 years ago

@rcorces That makes sense, I had a feeling that was the root of the issue. Is it possible to edit the fragment.tsv.gz file directly as if it were a .tsv file?

Thank you again

rcorces commented 3 years ago

Thats not really an ArchR question and also a question that is better posted to sequencing forums in the future given our limited bandwidth.

But using sed is probably your best option: zcat fragmentFile.tsv.gz | sed 's/hg19_//g' | gzip >fragmentFileFixed.tsv.gz

Granted I have no idea what the rest of your fragment file looks like so its really up to you to fix it into the correct input format.

Closing this issue.

jbjela commented 3 years ago

Definitely, apologies for the inconvenience, Thank you for all the help!

MJDelas commented 3 months ago

For anyone trying to fix their fragment files this way: "Note that ArchR requires bgzipped fragment files which is different from gzip. See samtools bgzip"