Closed jgrewe closed 7 years ago
So I was under the impression the code we lifted from h5py already had a minimum chunk size defined; I am all for it but maybe lets check the existing code for errors.
oh, yes, agreed, imho, this should be solved within the guess_chunking thingy. Apparently, it is set to 512 when not given
closed via #670
the following problem occurred: We recorded event for quite a long time (few hours) with about 6 Mio. Events. The DataArray storing them was defined like this:
This definition results in a chunk size of 1. Whenever an event was detected it was added to the data array. So far so good. When reading the data from the file this led to a problem. HDF5 apparently creates huge overhead when reading chunks of data. In our case this led to 16 to 17 GB(!) of memory allocation including swapping and a stalled system when reading all the 6 Mio. data points at once.
Ad hoc solution of course is not to read all data points at once. Better solution for further recordings is to initialise the
event_array
with{256}
or something like this.Here's my question: Should we introduce a minimum chunk size and what should it be? When creating an array with shape
(2,0,3)
as we tried yesterday it results in chunks of(2, 512, 3)
...