Fix chunk reuse verification for string dtype arrays

deshaw / versioned-hdf5

Versioned HDF5 provides a versioned abstraction on top of h5py

Other

76 stars 20 forks source link

No, I think the call to vectorize should broadcast across all dimensions. I think the exception in that issue happens because of the way that we detect whether we need to cast each element of the array as a bytes object. We can't use the dtype of the array because string arrays are read out of the file as object dtype arrays, so instead do this by looking at the type of the first element of the array:

if len(arr) > 0 and isinstance(arr.flatten()[0], bytes):
#                                     ^
#                            multidimensional datasets need to be flattened first!

Previously we just weren't flattening the multidimensional arrays, which meant we ended up trying to call bytes on a bytes object, which fails.

deshaw / versioned-hdf5

Fix chunk reuse verification for string dtype arrays #348