LLNL / H5Z-ZFP

A registered ZFP compression plugin for HDF5
Other
51 stars 22 forks source link

Support endian targetting #25

Closed markcmiller86 closed 5 years ago

markcmiller86 commented 5 years ago

What is endian targetting? This is a way for a data producer to decide, at write time, to have the HDF5 library endian-swap the data before it gets stored to the file. A reason for doing this is to free consumers, who may be on a different endian system, from paying the price for endian-swapping each time the data is read.

The HDF5 library is fully able to handle endian-swapping and often does so transparently to most consumers. In typical operation, the need for endian-swapping is detected only at read time and so only performed during reads when it is necessary. Endian targetting allows a data producer to pre-format the data for the most expected down stream read cases.

The interface for controlling this is handled by the hid_t type_id argument in H5Dcreate which can be used to influence the endianness in the file and hid_t mem_type_id in H5Dwrite which indicates the endienness of data being passed from memory to HDF5. Endian targetting is then the condition that the endianness in the H5Dwrite call does not match the endianness in the H5Dcreate call.

For reasons described here we currently disallow endian-targetting. But, we should probably support it for the same reasons described above. It would free readers from suffering the penalty on read.

But, the more I think about this, I don't think it is even possible. We can't endian-swap the data before handling it to ZFP to compress because we're then giving ZFP funky data; a data format it isn't expecting for the architecture it is currently executing on. Endian swapping after ZFP compresses it doesn't make sense either because at that point we don't have data for which endianness has any relevance.

@qkoziol is endian targetting even possible when combined with filters that are sensitive to endianess?

qkoziol commented 5 years ago

No, endian-targeting in this way is not currently possible in HDF5. The HDF5 library has mechanisms for user-defined type conversions and for user-defined compression filters, but it currently doesn’t have a way to combine those together for a user-defined module that performs both operations simultaneously.

Quincey

On Feb 8, 2019, at 2:10 PM, Mark C. Miller notifications@github.com wrote:

What is endian targetting? This is a way for a data producer to decide, at write time, to have the HDF5 library endian-swap the data before it gets stored to the file. A reason for doing this is to free consumers, who may be on a different endian system, from paying the price for endian-swapping each time the data is read.

The HDF5 library is fully able to handle endian-swapping and often does so transparently to most consumers. In typical operation, the need for endian-swapping is detected only at read time and so only performed during reads when it is necessary. Endian targetting allows a data producer to pre-format the data for the most expected down stream read cases. The interface for controlling this is handled by the hid_t type_id argument in H5Dcreate https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Create which can be used to influence the endianness in the file and hid_t mem_type_id in H5Dwrite https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write which indicates the endienness of data being passed from memory to HDF5. Endian targetting is then the condition that the endianness in the H5Dwrite call does not match the endianness in the H5Dcreate call.

For reasons described here https://github.com/LLNL/H5Z-ZFP/blob/master/docs/endian_issues.rst we currently disallow endian-targetting. But, we should probably support it for the same reasons described above. It would free readers from suffering the penalty on read.

But, the more I think about this, I don't think it is even possible. We can't endian-swap the data before handling it to ZFP to compress because we're then giving ZFP funky data; a data format it isn't expecting for the architecture it is currently executing on. Endian swapping after ZFP compresses it doesn't make sense either because at that point we don't have data for which endianness has any relevance.

@qkoziol https://github.com/qkoziol is endian targetting even possible when combined with filters that are sensitive to endianess?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LLNL/H5Z-ZFP/issues/25, or mute the thread https://github.com/notifications/unsubscribe-auth/AFAhwtm3W9HqN2r2rYK8kFFYxBLfb5T_ks5vLdmegaJpZM4axYkl.