brittneybrinsfield / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

pileup wrong range #60

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I used the tests file from pysam 0.4.1 and the following code from the pysam 
documentation:

import pysam
samfile = pysam.Samfile("../test/ex1.bam", "rb")
for pileupcolumn in samfile.pileup( 'chr1', 100, 101):
    print
    print 'coverage at base %s = %s' % (pileupcolumn.pos , pileupcolumn.n)
    for pileupread in pileupcolumn.pileups:
        print '\tbase in read %s = %s' % (pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos])

samfile.close()

Unfortunately, I have got a pileup output between 99 and 133.

coverage at base 99 = 1
    base in read EAS56_57:6:190:289:82 = A

coverage at base 100 = 1
    base in read EAS56_57:6:190:289:82 = G

coverage at base 101 = 1
    base in read EAS56_57:6:190:289:82 = G
...

coverage at base 131 = 1
    base in read EAS56_57:6:190:289:82 = C

coverage at base 132 = 1
    base in read EAS56_57:6:190:289:82 = A

coverage at base 133 = 1
    base in read EAS56_57:6:190:289:82 = C

Michal

Original issue reported on code.google.com by mictadlo on 21 Feb 2011 at 12:13

GoogleCodeExporter commented 9 years ago
The pileup method seems to iterate through all base pairs covered by reads that 
overlap the specified region.

Original comment by DarwinAw...@gmail.com on 23 Feb 2011 at 5:29

GoogleCodeExporter commented 9 years ago
I too have the same problem...hope I can trust pysam results..

Original comment by abhishen...@gmail.com on 23 Mar 2011 at 5:42

GoogleCodeExporter commented 9 years ago
You can trust the results. You just need to manually truncate the ends and fill 
in zero-coverage areas.

Original comment by DarwinAw...@gmail.com on 23 Mar 2011 at 6:28

GoogleCodeExporter commented 9 years ago
Could you please an example how to anually truncate the ends and fill in 
zero-coverage areas?

Original comment by mictadlo on 26 Mar 2011 at 5:48

GoogleCodeExporter commented 9 years ago
You can truncate the ends by simply skipping columns where pileupcolumn.pos is 
out of your desired range. You can "fill in" zero-coverage areas by simply 
reporting a zero for those positions Here is some (untested) example code:

import pysam
samfile = pysam.Samfile("../test/ex1.bam", "rb")
start = 100
end = 150
nextcol = start
for pileupcolumn in samfile.pileup( 'chr1', start, end):
    # Skip columns outside desired range
    if pileupcolumn.pos < start or pileupcolumn.pos > end:
        continue
    # Fill in zero-coverage areas
    while nextcol < pileupcolumn.pos:
        print
        print 'coverage at base %s = %s' % (nextcol , 0)
        nextcol += 1
    print
    print 'coverage at base %s = %s' % (pileupcolumn.pos , pileupcolumn.n)
    for pileupread in pileupcolumn.pileups:
        print '\tbase in read %s = %s' % (pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos])
    nextcol += 1

# Fill in zero-coverage area at the end of the region
while nextcol <= end:
    print
    print 'coverage at base %s = %s' % (nextcol , 0)
    nextcol += 1

samfile.close()

Original comment by r...@thompsonclan.org on 26 Mar 2011 at 6:14

GoogleCodeExporter commented 9 years ago
Thanks for raising this issue and your answers.

This behaviour is intended and consistent with the samtools pileup command.
The pileup docstring explains this.

Bw,
Andreas

Original comment by andreas....@gmail.com on 4 Jun 2011 at 8:24

GoogleCodeExporter commented 9 years ago
This seems to be a feature of the c-api but surprising behaviour given the 
command line samtool behaviour.
See this line in bam_plcmd.c:
if (conf->reg && (pos < beg0 || pos >= end0)) continue; // out of the region 
requested

Doing this in python rather than cython code makes things unnecessary slow.

If this is not fixed, can we at least flag this issue prominently and add some 
example code on how to (trivially) skip positions outside the specified 
regions...

Original comment by bunbu...@gmail.com on 3 Aug 2012 at 3:33