HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
128 stars 53 forks source link

Slow performance in arrayUtil functions with variable length data #188

Open jreadey opened 1 year ago

jreadey commented 1 year ago

The functions that deal with converting numpy arrays of variable length elements to a buffer and back can be quite slow. Running the test program: https://github.com/HDFGroup/hsds/blob/master/tests/perf/arrayperf/bytes_to_array.py with a million element array gave this output:

$ python bytes_to_array.py 
getByteArraySize - elapsed: 0.3334 for 1000000 elements, returned 7888327
arrayToBytes - elapsed: 3.1166 for 1000000 elements
bytesToArray - elapsed: 1.1793

Not surprising since it's iterating over each element in a loop.

Looked into using numba, but numba doesn't work with numpy arrays of object type.
Cython version of arrayUtil?