Closed ArvidJB closed 3 months ago
Hmm, so it looks like reused slices that are read from the underlying dataset contain bytes inside an object dtype array, but reused slices which are yet to be written to the file contain strings inside an object dtype array.
x: array(['a', 'b', 'c'], dtype=object)
y: array([b'a', b'b', b'c'], dtype='|S1')
I think we should be able to coerce both the data to be reused and the data the user is trying to write to bytes, and that should take care of this.
The check in
verify_chunk_reuse
does not handle strings correctly: