G-Node / nix

Neuroscience information exchange format
https://readthedocs.org/projects/nixio/
Other
66 stars 36 forks source link

Compression support #695

Closed jgrewe closed 6 years ago

jgrewe commented 7 years ago

addresses #510

this is more a quick and dirty implementation of compression support. It uses a hdf5 built-in compression algorithm with a fixed compression level. At the moment, one can decide for each dataArray to compress it or not, default it true.

This pr is meant to get us thinking about how it should actually work. For the moment, I would prefer such a simple solution which could be elaborated in future versions, or upon user request...

achilleas-k commented 7 years ago

Is there a reason we want compression defined for each DataSet separately? I imagine it would be nicer to have the option to enable during file creation and then each DataArray (or polynomial coefficient) creation method could check the global setting.

jgrewe commented 7 years ago

@achilleas-k, well I first thought so as well but this would mean that we have to ask, e.g. File which mode was selected upon each creation of a DataSet.... We can do it, though. I would keep the option to override the default and decide when not to compress, or vice versa. For example the imaging guys always keep the raw images uncompressed (well, image compression is something different, but there may be cases in which this is desired), or there are scenarios in which, e.g. for performance reasons, you want to have one DataArray uncompressed while you compress the others.

achilleas-k commented 7 years ago

Yeah, I see. I can see how this might be useful.

Would it be too complicated if the user could specify the default value globally? In other words, when opening or creating a file, you could specify the true or false for compression, then the DataArray creation function accepts an argument that overrides that default.

Might be a bit messy though.

jgrewe commented 7 years ago

I was wondering if such a default compression choice should be saved within the file? That is, when a file is re-opened do I automatically have the same settings? Such a feature would require a model change, however.

achilleas-k commented 7 years ago

I guess it could be a top-level attribute (alongside file format and version), the absence of which signals "off". That way it would be backwards compatible in the file format.

jgrewe commented 7 years ago

@achilleas-k, I tried to define a default during file opening. Unfortunately it won't work without an api change and this is something I would like to avoid. We can set the default compression choice at the level of the File and pass it to the Block(s) which then can apply it at any place (in principle). The problem starts at the `createDataArray function. If we want to avoid an api break, we have to add the compression flag with a default value. This is then leads to the situation that we always have a compression argument (with the default value) and a maybe conflicting compression flag set at file opening. Maybe I am to blind, but I do not see a simple way to tell/decide which one is the current user's wish. We could spend a further argument on this, but this becomes really ugly.

I would suggest to use the approach used here and default compression to a certain value that can be overridden at any place a dataset is created. For future versions that may break the the api anyway we may elaborate on this. Actually, I would really like to have the compression for our recordings, saves about 60% of disc space :)

achilleas-k commented 7 years ago

I see. Yeah, I guess we can think about it again in the future. The global option would be in addition to the current change anyway, so we can go ahead with this one and work on it later (or not).

jgrewe commented 6 years ago

@achilleas-k @gicmo, reworked the compression using @gicmo's enum approach. We can now pass it as either as a default while opening the file, or at each DataArray creation Switched the default to NO compression

jgrewe commented 6 years ago

a little rebase also from my side :)