Closed GoogleCodeExporter closed 8 years ago
Workaround is of course to use list(samfile.fetch(...)), but that negates the
benefit of having an iterator in the first place.
Please note that I haven't formally checked what happens if you fetch different
regions - but the discrepancies that led me to this bug fetched a lot of
different regions (potentially overlapping) multi threaded, so I except the bug
to be present as well.
Original comment by finkerna...@mathematik.uni-marburg.de
on 23 Jun 2010 at 3:30
Wild guess without actually inspecting the code:
I use multiprocessing and fork after creating the Samfile and see inconsistent
reading of reads.
That leads me to predict that there's a missing seek in the code since I guess
the filehandle's what's actually shared between the processes.
Original comment by finkerna...@mathematik.uni-marburg.de
on 23 Jun 2010 at 5:02
Hi,
this sounds like valid bug, but I can't reproduce it. I added the unit test
below, but all seems fine. Could you help me reproducing the bug?
Thanks,
Andreas
class TestDoubleFetch(unittest.TestCase):
'''check if two iterators on the same bamfile are independent.'''
def testDoubleFetch( self ):
samfile1 = pysam.Samfile('ex1.bam', 'rb')
samfile2 = pysam.Samfile('ex1.bam', 'rb')
for a,b in zip(samfile1.fetch(), samfile2.fetch()):
self.assertEqual( a, b)
def testDoubleFetchWithRegion( self ):
samfile1 = pysam.Samfile('ex1.bam', 'rb')
samfile2 = pysam.Samfile('ex1.bam', 'rb')
for a,b in zip(samfile1.fetch( "chr1", 200, 300), samfile2.fetch( "chr1", 200, 300)):
self.assertEqual( a, b)
Original comment by andreas....@gmail.com
on 18 Jul 2010 at 6:23
Sorry Andreas, I'm afraid I could have worded my problem better.
Concise:
a) Do not create two SamFile objects.
b) Do create two iterators on the same SamFile.
In code (had to extend the region for my example bam file):
class TestDoubleFetch(unittest.TestCase):
'''check if two iterators on the same bamfile are independent.'''
def testDoubleFetch( self ):
samfile1 = pysam.Samfile('ex1.bam', 'rb')
for a,b in zip(samfile1.fetch(), samfile1.fetch()):
self.assertEqual( a, b)
def testDoubleFetchWithRegion( self ):
samfile1 = pysam.Samfile('ex1.bam', 'rb')
chr, start, stop = 'chr1', 200, 3000000
self.assertTrue(len(list(samfile1.fetch ( chr, start, stop))) > 0) #just making sure the test has something to catch
for a,b in zip(samfile1.fetch( chr, start, stop), samfile1.fetch( chr, start, stop)):
self.assertEqual( a, b )
Original comment by finkerna...@mathematik.uni-marburg.de
on 19 Jul 2010 at 7:49
actually you were perfectly clear, I misunderstood.
I will check how to fix this.
Thanks,
Andreas
Original comment by andreas....@gmail.com
on 19 Jul 2010 at 7:53
actually you were perfectly clear, I misunderstood.
I will check how to fix this.
Thanks,
Andreas
Original comment by andreas....@gmail.com
on 19 Jul 2010 at 7:53
Hi,
The iterator currently assumes that the file position between calls
remains unchanged. I see the following solutions:
1. I could do a tell/seek in the iterator. Also, I would need to
implement some locking, because the seek, read, tell sequence has
to be atomic. This will need to be done properly (threading?) and
possibly slow down iteration.
2. When creating an iterator, give it its own file handle. I.e., the file
gets opened multiple times. Also, there might be some overhead in reading
indices,
etc.
3. Do nothing and leave this as a gotcha in the library.
I am tending towards solution 2.
Cheers,
Andreas
Original comment by andreas....@gmail.com
on 23 Jul 2010 at 8:15
I would lean to number 2 as well. Number 1 is complicated to do right (with
multiprocessing and threading...),
number 3 is not an option either, because it silently fails on the api's
contract.
Having a file handle for each iterator doesn't seem to be much overhead - they
are pretty cheap, but then, not knowing the code, I don't see which indices
you're talking about that wouldn't be reparsed already?
Original comment by finkerna...@mathematik.uni-marburg.de
on 23 Jul 2010 at 12:45
Hi,
fixed (fingers crossed) using option 2.
Please get the latest version 0.3-dev. Note that I have also switched
to samtools 0.1.8
Best wishes,
Andreas
Original comment by andreas....@gmail.com
on 23 Jul 2010 at 3:31
Original issue reported on code.google.com by
finkerna...@mathematik.uni-marburg.de
on 23 Jun 2010 at 3:17Attachments: