It does this by creating a dataset that can be extended in all directions; and automatically grows if the index of the element written requires it to do so. (Negating our ability to spot off-by-one programming errors.)
The API for reading/writing one element at a time feels like it would tempt users into writing files that way in a loop. Which is a rather serious issue on common HPC hardware (and not great on consumer hardware).
To enable this API it must make a default choice for the chunk size, currently 10^n. That seems very small and is at risk of creating files that can't be read efficiently. Picking it reasonably large might inflate the size of the file by a factor 100 or more.
I think it might be fine to allow users to read and write single elements of an existing dataset, i.e. without the automatically growing aspect; and a warning in the documentation to not use it in a loop. In core we support various selection APIs that are reasonably compact: list of random points, regular hyperslabs (general too) and there's a proposal to allow Cartesian products of simple selections along each axes.
In H5Easy there's API for reading and writing one element at a time: https://github.com/BlueBrain/HighFive/blob/5f3ded67b4a9928f4b9b5f691bc0a60aade32232/include/highfive/h5easy_bits/H5Easy_scalar.hpp#L66-L70
https://github.com/BlueBrain/HighFive/blob/5f3ded67b4a9928f4b9b5f691bc0a60aade32232/include/highfive/h5easy_bits/H5Easy_scalar.hpp#L120-L122
It does this by creating a dataset that can be extended in all directions; and automatically grows if the index of the element written requires it to do so. (Negating our ability to spot off-by-one programming errors.)
The API for reading/writing one element at a time feels like it would tempt users into writing files that way in a loop. Which is a rather serious issue on common HPC hardware (and not great on consumer hardware).
To enable this API it must make a default choice for the chunk size, currently
10^n
. That seems very small and is at risk of creating files that can't be read efficiently. Picking it reasonably large might inflate the size of the file by a factor 100 or more.I think it might be fine to allow users to read and write single elements of an existing dataset, i.e. without the automatically growing aspect; and a warning in the documentation to not use it in a loop. In core we support various selection APIs that are reasonably compact: list of random points, regular hyperslabs (general too) and there's a proposal to allow Cartesian products of simple selections along each axes.