aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

New iteration API #150

Open mulimoen opened 3 years ago

mulimoen commented 3 years ago

In order to get all items of an object, we need to call H5Literate. The current API only exposes this as member_names (attr_names in #64), where we collect all items into a Vec<String>. It would be beneficial to have more fine-grained control over this behaviour.

We have three options here:

Both the latter two requires collecting into a temporary vector, but we could expose some way to filter (e.g. only groups, only attributes) to minimise this cost.

gauteh commented 3 years ago

I would be super happy if H5Dchunk_iter can also be made available once it is included in a release. I've currently wrapped this in a rudimentary way here: https://github.com/gauteh/hidefix/blob/master/src/idx/chunks_iter.rs#L16 using a callback + for_each approach. As @mulimoen says, a proper rust Iterator is not possible (in a single thread) at the moment since HDF5 iterators cannot be resumed once they have been stopped. I think that eventually it will be necessary to have both callback + collect (and thus IntoIterator)... or we take on adding resume to hdf5 upstream. HDF5 development seems to have picked up pace lately, and everything seems to happen on github.

gauteh commented 2 years ago

Heads up that that chunk_iter is probably going to change output: https://github.com/HDFGroup/hdf5/issues/1419#issuecomment-1205778505, offset is going to be scaled.

mkitti commented 2 years ago

Note that the change makes H5Dchunk_iter consistent with H5Dget_chunk_info and H5Dget_chunk_info_by_coord.

That is if the H5Dget_chunk returns [64 64] as dim, then the offsets would be

  1. [0 0]
  2. [0 64]
  3. [0 128]
  4. ...

I am going to try to port H5Dchunk_iter back to the 1.12 and 1.10 branches of HDF5.

See https://forum.hdfgroup.org/t/backporting-h5dchunk-iter-to-1-12-and-1-10/9971 for the discussion.

gauteh commented 2 years ago

Awesome, that would make it available across more hdf5-versions of this package as well.