aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
308 stars 82 forks source link

Writing dataset with unknown length #294

Closed alexnivanov closed 2 months ago

alexnivanov commented 2 months ago

Greetings!

According to the documentation, I've figured out how to write datasets with known shape. However, how are we supposed to write datasets "on the fly" (knowing the dimensions, but not knowing full dataset length)? Consider the following code example:

let points = vec![vec![11, 21, 31], vec![12, 22, 32], vec![13, 23, 33]];
let len = points.len();
let width = 3;

let fixed_dataset = File::create("fixed.h5")
    .unwrap()
    .new_dataset::<i32>()
    .shape([len, width])
    .create("points")
    .unwrap();

for (ind, point) in points.iter().enumerate() {
    let point_view = ArrayView2::from_shape((1, width), point.as_slice()).unwrap();

    fixed_dataset
        .write_slice(point_view, s![ind..(ind + 1), 0..width])
        .unwrap();
}

let varied_dataset = File::create("varied.h5")
    .unwrap()
    .new_dataset::<i32>()
    .shape([0, width]) // We don't know length, so put 0 here
    .create("points")
    .unwrap();

// Suppose we get a stream of points and we need to persist it on the fly
for (ind, point) in points.iter().enumerate() {
    let point_view = ArrayView2::from_shape((1, width), point.as_slice()).unwrap();

    varied_dataset
        .write_slice(point_view, s![ind..(ind + 1), 0..width])
        .unwrap();
}

While the first file fixed.h5 is written as expected, with the second we are getting error Slice end 1 out of bounds for axis 0 with size 0 since we specified initial length as 0.

Any help would be appreciated!

mulimoen commented 2 months ago

You should be able to define the shape as growable with

.shape((.., width))

instead of

.shape([0, width])
alexnivanov commented 2 months ago

@mulimoen thanks for the advice, but I'm getting this error instead:

error[E0277]: the trait bound `Extent: From<RangeFull>` is not satisfied
   --> src/main.rs:27:16
    |
27  |         .shape((.., width)) // we don't know length, so put 0 here
    |          ----- ^^^^^^^^^^^ the trait `From<RangeFull>` is not implemented for `Extent`, which is required by `(RangeFull, usize): Into<Extents>`
    |          |
    |          required by a bound introduced by this call
    |
    = help: the following other types implement trait `From<T>`:
              <Extent as From<&T>>
              <Extent as From<(usize, Option<usize>)>>
              <Extent as From<RangeFrom<usize>>>
              <Extent as From<RangeInclusive<usize>>>
              <Extent as From<usize>>
    = note: required for `RangeFull` to implement `Into<Extent>`
    = note: required for `SimpleExtents` to implement `From<(RangeFull, usize)>`
    = note: 3 redundant requirements hidden
    = note: required for `(RangeFull, usize)` to implement `Into<Extents>`
note: required by a bound in `DatasetBuilderEmpty::shape`
   --> /Users/alex/.cargo/git/checkouts/hdf5-rust-6d93bbe2b511bb49/7b5eeae/hdf5/src/hl/dataset.rs:240:21
    |
240 |     pub fn shape<S: Into<Extents>>(self, extents: S) -> DatasetBuilderEmptyShape {
    |                     ^^^^^^^^^^^^^ required by this bound in `DatasetBuilderEmpty::shape`
mulimoen commented 2 months ago

It should be implemented for RangeFrom<usize>,

shape((0.., width))
alexnivanov commented 2 months ago

@mulimoen It works, but the result is the same as for shape([0, width]): Slice end 1 out of bounds for axis 0 with size 0. It seems that I have to somehow command dataset to grow instead of just trying to write_slice at corresponding place?

mulimoen commented 2 months ago

Yes, you need to resize (https://docs.rs/hdf5/latest/hdf5/dataset/struct.Dataset.html#method.resize):

varied_dataset.resize([ind + 1, width]).unwrap();
varied_dataset.write_slice(point, (ind, ..)).unwrap();
alexnivanov commented 2 months ago

@mulimoen thanks, works like a charm! Also thanks for helping eliminating the need for explicit ArrayView :)