blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Represent byte order #227

Open jdmarble opened 7 years ago

jdmarble commented 7 years ago

I work with data that is sometimes in non-native byte order. It looks like byte order can not yet be represented in a DataShape. I can get around this by using side channels like the keyword arguments in odo functions, but it would be more convenient to have byte order information as part of the shape.

Current Behavior

>>> ds.discover(np.array([1], dtype='u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='=u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='<u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='>u4'))
dshape("1 * uint32")

Desired Behavior

One way to do this would be to mirror Numpy and default to the system byte order. This is being run on a little-endian system.

>>> ds.discover(np.array([1], dtype='u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='=u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='<u4'))
dshape("1 * uint32")
>>> ds.discover(np.array([1], dtype='>u4'))
dshape("1 * >uint32")

Numpy's behavior for reference:

>>> np.dtype('u4')
dtype('uint32')
>>> np.dtype('=u4')
dtype('uint32')
>>> np.dtype('<u4')
dtype('uint32')
>>> np.dtype('>u4')
dtype('>u4')