deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
668 stars 205 forks source link

Read regression to 5' most site for CAGE #388

Closed vivekbhr closed 8 years ago

vivekbhr commented 8 years ago

For generating CAGE coverage files from bamcovergae. It will be good to allow generating 1 bp binned bam files where only 5' -most base-pairs are kept (keeping in mind the RNA-strand).

This is already implemented somewhere in Ribo-Seq branch, as @dpryan79 said.. So we will clean it up when merged..

dpryan79 commented 8 years ago

The current implementation is summarized here: https://github.com/fidelram/deepTools/blob/feature/riboseq_352/deeptools/bamCoverage.py#L227-L264

Do you need PE reads? That's an easy enough extension. Can you think of a context where specifying the strand wouldn't be useful? I suppose the DNase-seq cases that Anäis presented yesterday would be examples of that.

vivekbhr commented 8 years ago

Most CAGE data out there is single-end, though there are other protocols that rely on paired-end seq. So it would be good to have it for both.

Strand will sure be needed for CAGE but maybe not for DNAse/MNAse etc..

dpryan79 commented 8 years ago

I'll work on this a bit tomorrow in the riboseq branch, where it's mostly implemented already, and then merge it into develop.

dpryan79 commented 8 years ago

OK, this is fully implemented now. SE and PE reads are supported, positive and negative offsets are also supported. As an example, from here on out, --Offset 1 will use only the first base of each alignment (after accounting for orientation. Note that the --MNase option is different, since it centers on the 2 or 3 bases at the middle of fragments, whereas --Offset is used on the read level.

I'll update the Galaxy wrappers and then merge that branch (thereafter closing this issue).

dpryan79 commented 8 years ago

I've just merged this into develop.