G-Node / nix

Neuroscience information exchange format
https://readthedocs.org/projects/nixio/
Other
67 stars 36 forks source link

Define a minimum chunk size? #668

Closed jgrewe closed 7 years ago

jgrewe commented 7 years ago

the following problem occurred: We recorded event for quite a long time (few hours) with about 6 Mio. Events. The DataArray storing them was defined like this:

nix::DataArray event_array = b.createDataArray("Events", "nix.timestamp", nix::DataType::Double, {1});

This definition results in a chunk size of 1. Whenever an event was detected it was added to the data array. So far so good. When reading the data from the file this led to a problem. HDF5 apparently creates huge overhead when reading chunks of data. In our case this led to 16 to 17 GB(!) of memory allocation including swapping and a stalled system when reading all the 6 Mio. data points at once.

Ad hoc solution of course is not to read all data points at once. Better solution for further recordings is to initialise the event_array with {256} or something like this.

Here's my question: Should we introduce a minimum chunk size and what should it be? When creating an array with shape (2,0,3) as we tried yesterday it results in chunks of (2, 512, 3) ...

gicmo commented 7 years ago

So I was under the impression the code we lifted from h5py already had a minimum chunk size defined; I am all for it but maybe lets check the existing code for errors.

jgrewe commented 7 years ago

oh, yes, agreed, imho, this should be solved within the guess_chunking thingy. Apparently, it is set to 512 when not given

jgrewe commented 7 years ago

closed via #670