Update .tobytes() to use endian-independent code when generating the bytestring record

CartoDB / raster-loader

https://raster-loader.readthedocs.io

Other

15 stars 4 forks source link

Update .tobytes() to use endian-independent code when generating the bytestring record #29

Closed francois-baptiste closed 1 year ago

francois-baptiste commented 1 year ago

I think the call to numpy .tobytes() need some rework as it doesn't seems endian agnostic (and thus not platform agnostic).

It seems I used little-endian the default endianness of my machine but it seem big-endian is the norm, at least in this Wikipedia page https://en.wikipedia.org/wiki/Deadbeef.

# Do not match wikipedia
>>>  numpy.array([3735928559],dtype=numpy.uint32).tobytes()
b'\xef\xbe\xad\xde' #result on my machine

# Match wikipedia => Let's use this as default
>>>  numpy.ascontiguousarray(np.array([3735928559],dtype=numpy.uint32), dtype='>i4').tobytes()
b'\xde\xad\xbe\xef' #result on my machine

francois-baptiste commented 1 year ago

Sorry @giancastro @brendancol , I have to reopen this. Even if the addition of np.ascontiguousarray add an extra security check it doesn't make the code platform agnostic

In [1]: np.ascontiguousarray(np.array([3735928559],dtype=np.uint32)).tobytes()
Out[1]: b'\xef\xbe\xad\xde' #machine is little-endian

In [2]: np.ascontiguousarray(np.array([3735928559],dtype=np.uint32)).tobytes()
Out[2]: b'\xde\xad\xbe\xef' #machine is big-endian

francois-baptiste commented 1 year ago

@giancastro Here is the code I was thinking of: It should work, even if a bit complex to test endianness agnosticism.

import sys

should_swap = {"=": sys.byteorder == "little", "<": True, ">": False, "|": False}

arr = np.array([3735928559], dtype="<i4")
#arr = np.array([3735928559], dtype='>i4')
#arr = np.array([3735928559], dtype=np.int32)

if should_swap[arr.dtype.byteorder]:
    print(np.ascontiguousarray(arr.byteswap()).tobytes())
else:
    print(np.ascontiguousarray(arr).tobytes())