Open EllieDuan opened 3 years ago
Hi, any ideas how can one fix it, or go around it? I'm obviously only interested in the open areas (i.e. no nucleosomes) of the chromosome I'm analyzing.
Hi, any ideas how can one fix it, or go around it? I'm obviously only interested in the open areas (i.e. no nucleosomes) of the chromosome I'm analyzing.
Not really helpful here regarding HMMRATAC.
I ended up selected open regions using Samtools for insert size less than 100bp: https://accio.github.io/bioinformatics/2020/03/10/filter-bam-by-insert-size.html
Then calling peaks on these open regions bam files using Macs2.
Thanks for the reply, I'll try that out.
Why didn't you call the peaks on the new file using HMMRATAC? Is the output of your workflow incompatible with what HMMRATAC is looking for?
Sorry for the delay. Let me start by asking if this data was size selected? Ie on a gel etc.
Thanks for the reply, I'll try that out.
Why didn't you call the peaks on the new file using HMMRATAC? Is the output of your workflow incompatible with what HMMRATAC is looking for?
I think HMMRATAC works better for input without size selection for their model, based on their paper: https://academic.oup.com/nar/article/47/16/e91/5519166, therefore, I decided to go back the traditional way. The bam file after Samtool size selection should be compatible with HMMRATAC.
Sorry for the delay. Let me start by asking if this data was size selected? Ie on a gel etc.
The bam file I input in HMMRATAC that run into this error message was not being size selected, the library for sequencing is also not selected for insert size. It contains all short and long fragments. Thank you.
The bam file I input in HMMRATAC that run into this error message was not being size selected, the library for sequencing is also not selected for insert size. It contains all short and long fragments. Thank you.
I think there is a misunderstanding about the —trim option. It isn’t a parameter for excluding nucleosomes or isolating open regions. It is meant to allow the processing of libraries that were subjected to some size selection. So if your data wasn’t, I don’t recommend using that option. If you are only interested in open regions, and not the flanking nucleosomes, then use the coordinates at columns 7-8 of the gappedPeak file, which designates the open regions alone. But the algorithm should be run with the whole library and without —trim.
As an aside, regardless of the software, you shouldn’t do any size selection with ATAC-seq, physical or computational. We were able to show, in the HMMRATAC paper, that size selection results in a loss of precision compared to non selected data
The bam file I input in HMMRATAC that run into this error message was not being size selected, the library for sequencing is also not selected for insert size. It contains all short and long fragments. Thank you.
I think there is a misunderstanding about the —trim option. It isn’t a parameter for excluding nucleosomes or isolating open regions. It is meant to allow the processing of libraries that were subjected to some size selection. So if your data wasn’t, I don’t recommend using that option. If you are only interested in open regions, and not the flanking nucleosomes, then use the coordinates at columns 7-8 of the gappedPeak file, which designates the open regions alone. But the algorithm should be run with the whole library and without —trim.
As an aside, regardless of the software, you shouldn’t do any size selection with ATAC-seq, physical or computational. We were able to show, in the HMMRATAC paper, that size selection results in a loss of precision compared to non selected data
Thank you for the clarification, this makes much sense now! Many thanks!
@EvanTarbell Thanks a lot for the info! I'll be sure to look at the gappedPeaks file. BTW, regardless, the trim
option is still buggy and causing the crash @EllieDuan described here.
Yes, i understand there is an issue with —trim
I’m going to reopen this issue so I can address whatever bug caused the original error
Hello again @EvanTarbell,
I just want to make sure I got your previous explanation regarding the open chromatin regions right.
What I did was to run HMMRATAC on an ENCODE ATAC-Seq dataset to see if I can re-create standard fragment length distribution plots as seen in many ATAC-Seq-centered works. The results show that the average fragament size is much larger than expected, and I hardly see any fragments with a length below 150 bp, which should point at the nucleosome-free regions.
I'm reaching out since I'm (obviously) very new to this field and I'm not sure if I'm missing something basic. I'd be happy if you could make sure that the dataset I'm looking at should indeed produce that fragment size histogram I was referring to.
Thanks again and sorry if this is out of scope.
For future reference, I did arrive at some sort of a solution. That same BAM file already contains the correct fragment length distribution, we just have to extract it out. I found this blog post with the following intimidating one-liner:
samtools view ATAC_f2q30_sorted.bam | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[ \t]*//' > fragment_length_count.txt
This generates a text file that can be histogrammed, and the resulting histogram does indeed contain the necessary periodicity. As a caveat, I'm not sure that this is best-practice, or for that matter even correct :)
Describe the bug
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:425)
To Reproduce This error only show up when used --trim 3 option
System (please complete the following information):
Additional context