Kinetic / kinetic-py

Kinetic Python Library
http://seagate.github.io/kinetic-py
22 stars 10 forks source link

client returns bytearray objects #9

Closed toolslive closed 10 years ago

toolslive commented 10 years ago

Not sure this is a bug, but people will lose time with this. As this little example shows, you put a string value, and when you retrieve it, you get a bytearray, which was unexpected.

>>> from kinetic import Client
>>> c = Client('localhost', 8123)
>>> c.put('message','hello world')
True
>>> c.get('message').value
bytearray(b'hello world')

The type may influence the behaviour of calls to other libraries. (for example pyeclib, will have a decode error)

icorderi commented 10 years ago

It's an interesting point @toolslive. The problem is mainly with python representing byte arrays as strings. Python 2.6 added bytearrays as a backport from 3.x

The move internally on kinetic from traditional python bytes representation (str) to an actual bytearray was for performance reasons.

And to be honest, the change was done around the time I started doing erasure codes on key/values. You need inplace XORing to get a decent performance out of that which means passing byte arrays down to c code. If you are working with strings, you go through a lot of unnecessary mem copies.

For convenience, the kinetic put operations work on both strings and bytearrays, and very soon you should be able to do zero copy by passing a file descriptor as a value. But that doesn't mean you are storing a string. We store bytes and retrieve bytes, 'hello world' is a convenient way of passing a byte array that contains those 11 bytes.

It would be nice for pyeclib to support byte arrays for 2.6+ given that they already do for 3.x. "Note: bytes is a synonym to str in Python 2.6, 2.7. In Python 3.x, bytes and str types are non-interchangeable and care needs to be taken when handling input to and output from the encode() and decode() routines." that is directly from their code repo [https://bitbucket.org/kmgreen2/pyeclib] We might be able to give a similar warning, "you might give us strings, or bytes, but we are returning bytes to you."

I hope that helps, in any case, bytearray as the output is the expected behavior.