exxeleron / qPython

interprocess communication between Python and kdb+
http://www.devnet.de
Apache License 2.0
151 stars 90 forks source link

Serialization of Lists of Strings #30

Closed 691175002 closed 9 years ago

691175002 commented 9 years ago

I found the existing behavior when serializing length one strings to be surprising.

s = Series(["One","Two","3"])
s.meta = MetaData(qtype=QSTRING_LIST)

k.sync('type each',s)
    0    10
    1    10
    2   -10
    dtype: int1

k.sync('{x like "Two"}', s)
    `type

I realize that QSTRING_LIST == QGENERIC_LIST but as far as I can tell there is no way to serialize a list of strings without the risk of some ending up as characters.

This fork contains a pretty straightforward change - QWriter will now always serialize str as a QSTRING regardless of its length.

The bytes type retains old behaviour so a bytestring will serialize as a QSTRING while a length-one bytes will serialize to the QCHAR type. This change is carried over to QReader, so QSTRING is converted to str and QCHAR is converted to bytes.

I'm not sure if this is the best approach to the problem, or if other people even consider it an issue but I figured I'd toss it out as an option.

maciejlach commented 9 years ago

Thanks for pointing this out. This default encoding is based on behavior of q console, e.g.:

q)type each ("one";"two";"3")
10 10 -10h

I agree that this might become surprising and limiting in some cases. In order to minimize impact on existing code base, I would propose a different approach based on parser configuration mechanism. The QConnection would accept a single_char_strings argument which would control the encoding for single character length scripts:

q = qconnection.QConnection(host='localhost', port=5000, single_char_strings = True)
q.open()

s = pandas.Series(["One","Two","3"])
s.meta = MetaData(qtype=QSTRING_LIST)

r = q('{[x] type each x}', s)
print (r)

s = qlist(['One', 'Two', '3'], qtype = QSTRING_LIST)
r = q('{[x] type each x}', s, single_char_strings = True)
print (r)

s = ['One', 'Two', '3']
r = q('{[x] type each x}', s)
print (r)

q.close()

Would yield:

[10 10 10]
[10 10 10]
[10 10 10]

Does this solution match your use case?

691175002 commented 9 years ago

That would probably be a better option. It would work well for me.

maciejlach commented 9 years ago

The enhancement has been applied to master and 1.1 branch.

Check the documentation for 1.1 for details.