deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
668 stars 205 forks source link

Ribosomal profiling with bamCoverage #352

Closed dpryan79 closed 8 years ago

dpryan79 commented 8 years ago

It would be nice to be able to use bamCoverage meaningfully to look at ribosomal profiling. The new metagene options are largely doing that, but we need to add a modified version of the --MNase option. The difference to --MNase will be that single-end reads will be needed and the resulting signal will be the 1 base at the center of each read.

This would also demonstrate a nice usage of both the metagene summarization but also using a GTF file with an alternate --exonID (CDS, rather than exon). We'll have to play around with a public dataset to see how even length reads should be treated, since their center is between two bases.

dpryan79 commented 8 years ago

I might also write a little post-processing script to take the output of computeMatrix and split each line into 3 entries (one for each reading frame). That'd allow things like figure 1A below (from here)

figure 1A

dpryan79 commented 8 years ago

The bamCoverage changes might also require using strand information to exclude random antisense alignments.

dpryan79 commented 8 years ago

A bit of reading indicates that instead of centering, one uses a fixed offset from the start of the read (normally either 12 or 15 bases, for the P or A site). The correct offset yields a nice peak at the AUG (and/or stop codon). Apparently in an ideal world each read length (they need to be adapter but not quality trimmed) would have its own offset, but this seems to be largely overkill.

dpryan79 commented 8 years ago

I just added a branch where bamCoverage has a --RiboSeq option. This needs testing, of course, and I still need a script to post process the output of computeMatrix to merge files and split by frame.

steffenheyne commented 8 years ago

Is ribosome profiling mainly/only performed via single-end sequencing? We would exclude the combination of a paired-end bam and --RiboSeq?

dpryan79 commented 8 years ago

Correct, one only does SE sequencing (the reads are using ~28 bases after adapter trimming). I'm currently excluding PE alignments for that reason.

friedue commented 8 years ago

maybe we should think about implementing certain default settings? i.e., bamCoverage rifseq, bamCoverage strandedRNA-seq, bamCoverage genomicDNA as short-cuts for certain combinations of useful parameter settings.

dpryan79 commented 8 years ago

If we allow negative values as the offset, then we can use the same method to support GRO-seq and PRO-seq. I assume that people only pay attention to the last base of each read in such cases, but I'll check into that since we're now working on those a bit internally.

I guess we should rename the option if that's the case.

dpryan79 commented 8 years ago

This has now been merged into develop.