MaciekAber / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Tabixfile.fetch() should accept a function as a parser #142

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Tabixfile.fetch() unnecessarily requires the parser parameter to be an instance 
of pysam.ctabix.Parser or one of its subclasses. That class is defined as

cdef class Parser:
    pass

Further, if we take a look at the simplest defined parser, asTuple, we see it 
has a single method, __call__ so that it is callable:

cdef class asTuple(Parser):
    '''converts a :term:`tabix row` into a python tuple.

    Access is by numeric index.
    ''' 
    def __call__(self, char * buffer, int len):
        cdef TabProxies.TupleProxy r
        r = TabProxies.TupleProxy()
        # need to copy - there were some
        # persistence issues with "present"
        r.copy( buffer, len )
        return r

asTuple is a paragon of a class that should just be a function, especially 
given it doesn't even define __init__! http://youtu.be/o9pEzgHorH0?t=2m9s

The parser parameter should just be a callable, and in most circumstances, 
that's simply a function. This would make it easier and more straightforward 
for users of pysam to write and use their own parsers.

Original issue reported on code.google.com by chris.la...@gmail.com on 24 Oct 2013 at 8:36

GoogleCodeExporter commented 8 years ago
Hi Chris,

thank you for your suggestion. The rationale of the classing was
to be able to pass the original char * from the tabix engine
to the parser. The motivation was to try implementing lazy parsing and
avoiding unnecessary data copying for performance reasons.

I have not paid much detail to the implementation to see if that
is indeed the case and have no profiling data to back that up. Currently,
the char * is copied once by the parser.

I will check what can be done to either generalize or to tighten up.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 4 Nov 2013 at 8:18

GoogleCodeExporter commented 8 years ago
Hi Chris,

I have tightened the parser interface to enforce strict types for performance 
reasons. To use tabix with a generic parser, the plain iterator can simply be 
wrapped in a parser.

Best wishes,
Andreas 

Original comment by andreas....@gmail.com on 26 Nov 2013 at 9:32