exxeleron / qPython

interprocess communication between Python and kdb+
http://www.devnet.de
Apache License 2.0
151 stars 90 forks source link

QSYMBOL as unicode string in Python 3 #35

Closed audetto closed 8 years ago

audetto commented 8 years ago

Hi,

I am using python 3 and when I query my employer kdb server I get back a lot of QSYMBOL and QSYMBOLLIST which are converted to numpy.string which I seem to understand is just bytes.

This is real annoying as the rest of my code uses plain python 3 strings.

Would it be possible for the user to specify an encoding and convert them to string? Maybe using the QReader mapping mechanism? Is it private or can be overwritten?

maciejlach commented 8 years ago

Unfortunately, at the moment we don't support to replace QReader/QWriter with custom implementations. This might be a subject to change in future releases.

At the moment the most straight forward solution would be to introduce a new option parameter, similar to numpy_temporals. This would allow to override default behavior and return QSYMBOLs as Python strings.

maciejlach commented 8 years ago

The 1.2.0b1 version provides basic support for extending the QReader and QWriter classes.

Here is a code snippet with subclassed QReader:

class MyQReader(QReader):
    # QReader and QWriter use decorators to map data types and corresponding function handlers 
    parse = Mapper(QReader._reader_map)

    def _read_list(self, qtype):
        if qtype == QSYMBOL_LIST:
            self._buffer.skip()
            length = self._buffer.get_int()
            symbols = self._buffer.get_symbols(length)
            return [s.decode(self._encoding) for s in symbols]
        else:
            return QReader._read_list(self, qtype = qtype)

    @parse(QSYMBOL)
    def _read_symbol(self, qtype = QSYMBOL):
        return numpy.string_(self._buffer.get_symbol()).decode(self._encoding)

with qconnection.QConnection(host='localhost', port=5000, reader_class = MyQReader) as q:
    symbols = q.sync('`foo`bar')
    print(symbols, type(symbols), type(symbols[0]))

    symbol = q.sync('`foo')
    print(symbol, type(symbol))
audetto commented 8 years ago

I will try it.

But I found an other issue: about QSTRING

The doc says they are converted to Python strings

here:

https://github.com/exxeleron/qPython/blob/master/doc/source/type-conversion.rst#string-and-symbols

This is probably true in Python 2, but in Python 3, they are byte arrays. Probably the doc should be clarified.

I wonder if in Python 3 people would find it more natural if all these byte arrays where turned into strings using a customisable encoding (I see that sometimes "latin-1" is used). Byte arrays (as a replacement for strings) are really awkward to use in python 3.

maciejlach commented 8 years ago

Thanks for feedback. I'll update documentation.

You can use the same approach to override default handling of QSTRING type. QReader now supports custom encoding - it's passed as a constructor parameter.

audetto commented 8 years ago

I see. One thing needs to be taken into account.

There are 2 ways to override methods

One needs to remember that just by writing the code above, the behaviour of the exisiting QReader has been changed, as the QSYMBOL parser is global. On the other hand _read_list is in the vtable.

So it is somehow hard to swap the 2 in a single running instance of python.

maciejlach commented 8 years ago

Yes, that's correct. This is a design flaw which I will aim to fix in future release.

maciejlach commented 8 years ago

I've adjusted the QReader and QWriter to use mapping dictionary from the sub-class. You can use parse time decorators to extend/modify default mapping. You have to remember to create copy of mapping from the parent class.

The standard way of overriding by providing implementation of protected methods (e.g. _read_list) is still allowed.

I've updated the documentation and provided updated example.