gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
463 stars 136 forks source link

MerDNA has nondeterministic behavior (Python binding) #30

Closed crashfrog closed 9 years ago

crashfrog commented 9 years ago
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import jellyfish
>>> mer, count = jellyfish.ReadMerFile('1.jf').next() #deal me a mer, one off the top
>>> print count
1
>>> mer, count
(<jellyfish.jellyfish.MerDNA; proxy of <Swig Object of type 'MerDNA *' at 0x7fed0dded090> >, 1L)
>>> print mer
AAAAAAAAACTTTTGTCAATCGGTTCCCTTAGA
>>> print mer
TAAAAAAAAAAAAAAAAAAACAAAGCATGTTAA
>>> mer == mer
True
>>> str(mer)
'AAAAAAAAAAAAAAAAAAAACCGTTCTGCTCAA'
>>> print mer
TAAAAAAAAAAAAAAAAAAACCGTTCGAAGGAA

What's going on here?

gmarcais commented 9 years ago

What is going on is that the mer object returned depends on the ReadMerFile object. Which python is eager to delete after calling next(). So this will work:

Python 2.7.5+ (default, Sep 17 2013, 15:31:50) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import jellyfish
>>> mf = jellyfish.ReadMerFile('tests/seq10.jf')
>>> mer, count = mf.next()
>>> mer, count
(<jellyfish.jellyfish.MerDNA; proxy of <Swig Object of type 'MerDNA *' at 0x1ddd7b0> >, 1L)
>>> print mer
AATGGAGAATAGGTTTCAAGTTTTTAAGTATCTCTAAACAATCTGTGGCC
>>> print mer
AATGGAGAATAGGTTTCAAGTTTTTAAGTATCTCTAAACAATCTGTGGCC
>>> mer == mer
True
>>> str(mer)
'AATGGAGAATAGGTTTCAAGTTTTTAAGTATCTCTAAACAATCTGTGGCC'
>>> print mer
AATGGAGAATAGGTTTCAAGTTTTTAAGTATCTCTAAACAATCTGTGGCC

On the other hand, you can copy the mer to get an object independent of the ReadMerFile object. For example the following will work properly:

import jellyfish

def load(file):
    mf = jellyfish.ReadMerFile(file)
    return mf.next()

mer, count = load('tests/seq10.jf')
print(mer)
print(mer)