Closed sk1p closed 6 years ago
Immediately looks good :) Can you post your benchmarks due to this change, without changes in libhdfs3, here? I expect that you made the default case of returning bytes faster too.
AttributeError: 'memoryview' object has no attribute 'nbytes'
Perhaps you need to use m.itemsize * reduce(operator.mul, m.shape)
(py2 only) ?
Oh yeah, forgot about py2. Turns out it also doesn't support creating ctypes arrays from memoryviews, so I now pass the original buffer to .from_buffer(...)
. Will run benchmarks tomorrow.
Updated the benchmark - these are the numbers for the different configurations with unpatched libhdfs3:
old_read(length=READ_SIZE): 3.09601 # ~three copies, crc verification, buffer re-allocation
read(length=READ_SIZE): 2.47153 # two copies, crc verification, buffer re-allocation
read(length=READ_SIZE, out_buffer=True): 2.15100 # single copy, crc verification, buffer re-allocation
read(length=READ_SIZE, out_buffer=buf): 1.97234 # single copy inside libhdfs3 + crc verification
OK, thank you!
See https://github.com/dask/hdfs3/issues/160 for the in-depth discussion.