chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
604 stars 243 forks source link

bam_to_wiggle #59

Closed vineeth-s closed 11 years ago

vineeth-s commented 11 years ago

Hi Brad,

Thanks much for this neat little script. There is no provision for strand information though .... ?, pysam's pileup does not make any provision for strand I presume, so that makes things a tad complicated ?

Vineeth

chapmanb commented 11 years ago

Vineeth; Thanks much. This is a limitation of the wiggle format, which supports only numerical values at chromosome positions:

http://genome.ucsc.edu/goldenPath/help/wiggle.html

UCSC will handle BAM files directly if you'd like to visualize by strand information.

Also, there is an updated version of this script that handled RNA-seq mapping with gapped reads in the Galaxy toolshed:

http://toolshed.g2.bx.psu.edu/repository/browse_categories?sort=name&operation=view_or_manage_repository&f-deleted=False&f-free-text-search=wig&id=2498224736779bdb

Hope this helps, Brad

vineeth-s commented 11 years ago

Hi Brad,

you can make those numerical values negative (indirectly indicating the - strand) and UCSC then displays it using an appropriate 0-centered axis,I have done this for count data previously, and it works fine

The problem with looking at the BAM directly is with increasing sequencing depth, visual inspection doesn't really work

I guess I'd have to use pysam to iterate over the reads, check for strandedness, count, normalise and generate the wiggle file ...

Vineeth

chapmanb commented 11 years ago

Vineeth; Nice one, tricky hack. So if you have a positive and negative value at the same position UCSC will display correctly?

In terms of support, the latest version offloads the work of collapsing to coverage to bedtools genomeCoverageBed:

http://code.google.com/p/bedtools/

It might be worth inquiring with Aaron if this would be something useful to support.

Otherwise, pysam does give you access to the individual reads with the pileups attribute, so you'd have to iterate and count based on this as you suggest:

http://www.cgat.org/~andreas/documentation/pysam/api.html#pysam.PileupColumn

Brad

vineeth-s commented 11 years ago

Brad,

That I glossed over, and you are right there, it does not allow the same position to have 2 values. If you have overlapping values I guess the only way to look at them simultaneously on one track on UCSC would be to have track hubs generated from using genomeCoverageBed with its strand option

Thanks Vineeth