aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

Streaming I/O (pipes, ...) #127

Closed hohav closed 3 years ago

hohav commented 3 years ago

I would like to output my HDF5 to STDOUT (for piping to other tools like gzip), but currently I can only do that by first writing to a temp file and then copying it to STDOUT. This is awkward and slow due to the unnecessary copying.

Would it be feasible for hdf5::File::open/create to take io::Read/Write implementors, instead of just file names?

mulimoen commented 3 years ago

I'm afraid Read/Write is not possible, as this is not compatible with the underlying c-library. There is a function H5Fget_file_image one could use to get the underlying bytes. Would this be useful for you?

hohav commented 3 years ago

Thanks for the response. H5Fget_file_image would help only with reading, right? Unfortunately that's not what I need.

Do you know if the C library doesn't support streams because it needs to seek while writing, or is it just that no one's requested that?

mulimoen commented 3 years ago

It is also possible to write your entire file in memory, avoiding disk entirely. This should avoid the copy entirely, but requires your data to fit in memory. This is not currently implemented in this wrapper.

hdf5 needs random access, either by working in-memory, from a file, or by caching bytes from the network. It is not designed to be used as a streaming format, it acts more like a file system.

aldanor commented 3 years ago

(for piping to other tools like gzip)

HDF5 already supports gzip and other filters internally, why would you pipe its output anywhere?

Again, it's not a streaming file format, it's a model of entire file system (and memory system). One of the reasons it needs random access is that it stores various metadata in headers and has a notion of "heap" and "cache" etc, so while writing datasets it may need to seek back and update/mutate some bytes that have already been written.

hohav commented 3 years ago

Makes sense, thanks.