Closed peytondmurray closed 3 months ago
No, I think the call to vectorize
should broadcast across all dimensions. I think the exception in that issue happens because of the way that we detect whether we need to cast each element of the array as a bytes
object. We can't use the dtype of the array because string arrays are read out of the file as object
dtype arrays, so instead do this by looking at the type of the first element of the array:
if len(arr) > 0 and isinstance(arr.flatten()[0], bytes):
# ^
# multidimensional datasets need to be flattened first!
Previously we just weren't flattening the multidimensional arrays, which meant we ended up trying to call bytes
on a bytes
object, which fails.
This PR fixes an issue with string datasets where reused chunks were not correctly verified.
Previously, chunks that were written to the dataset and then reused contained
bytes
elements, but chunks that were pending a write but being reused (e.g. by some other chunk in the pending write operation) could containstr
elements, causing problems for the array comparison. With this change, both the chunk that the user is trying to write and the chunk to be reused are coerced to object dtype arrays of bytes before the comparison is completed.Additionally multidimensional string datasets are now correctly verified as well, closes #339 and closes #338.