Open ewallace opened 7 years ago
Perhaps you could something like this to subtract out regions between the splits after using the -pc:
bedtools genomecov -ibam exampleYDR064W.bam -bga -pc -strand + > pc.bedgraph bedtools genomecov -ibam exampleYDR064W.bam -bga -strand + > unspilt.bedgraph bedtools genomecov -ibam exampleYDR064W.bam -bga -strand + -split > split.bedgraph bedtools unionbedg -i pc.bedgraph unspilt.bedgraph split.bedgraph | awk '{print $1,$2,$3,$4-$5+$6}' > pe_SpliceAware.bedgraph
Hope it isn't bad form to bump this -- this feature would really help me out. Thanks, @bjf79, for your hack -- I'll try it out, though I'm hoping for something simpler.
@arq5x:
Is this the recommendet way to get coverage tracks for paired-end spliced data?
bedtools genomecov -ibam exampleYDR064W.bam -bga -pc -strand + > pc.bedgraph
bedtools genomecov -ibam exampleYDR064W.bam -bga -strand + > unspilt.bedgraph
bedtools genomecov -ibam exampleYDR064W.bam -bga -strand + -split > split.bedgraph
bedtools unionbedg -i pc.bedgraph unspilt.bedgraph split.bedgraph | awk '{print $1,$2,$3,$4-$5+$6}' > pe_SpliceAware.bedgraph
If so, could this be added to the documentation or even be implemented as a new flag for genomecov
(as it is using bedtools
only except for the little awk
-part that shouldn't be hard to do internally, too).
I think this is an extremely important feature currently missing in bedtools as
bedtools genomecov
created coverage tracks for these kind of data look 'unstranded' and simply do not reflect biology.But maybe I am missing some corner cases here?
edit: Just for reference, a very old discussion on a similar issue.
Hi, This still an important need, for my group. Is there anything I can do to help move it up the agenda? Sadly this is beyond my knowledge. If you know a software engineer who could fix this, I would happily look in to paying their time from my grant. Edward
I was hoping to use -pc and -split together as well Was this feature ever addressed? Does that 4-step protocol work as intended?
Hi, This is either a bug or a feature request for the (wonderful, much appreciated) bedtools suite.
I have paired-end reads, and I'd like to compute splice-aware genome coverage. However, with genome cov I can only seem to do (a) splice-aware coverage of each mate separately using the -split option, OR (b) paired-end but non-splice-aware coverage using the -pc option. I would like to be able to use both options to get a result like that in IGV's automatic coverage option.
Minimal reproducible example attached:
Example bam file is: exampleYDR064W.bam hisat2-aligned to SacCer3 genome, filtered to reads mapping to YDR064W
-split only aligns each member of a read pair to a separate strand
bedtools genomecov -ibam exampleYDR064W.bam -bga -split -strand + > split_plus.bedgraph bedtools genomecov -ibam exampleYDR064W.bam -bga -split -strand - > split_minus.bedgraph
-pc only aligns both members of a read to the same strand, but fills in gaps/introns
bedtools genomecov -ibam exampleYDR064W.bam -bga -pc -strand + > pc_plus.bedgraph bedtools genomecov -ibam exampleYDR064W.bam -bga -pc -strand - > pc_minus.bedgraph
-split and -pc together is identical to -pc only; I want it to drop gaps inside reads, like the .IGV-computed coverage
bedtools genomecov -ibam exampleYDR064W.bam -bga -split -pc -strand + > splitpc_plus.bedgraph bedtools genomecov -ibam exampleYDR064W.bam -bga -split -pc -strand - > splitpc_minus.bedgraph
It also looks as if -pc switches strand assignment relative to my expectations?
Data in here: paired-end_splice-aware_genomecov.zip
Apologies if this has previously been addressed and my googling failed to find it.
Thanks Edward