matrix data type float32, float16, int16,...

DAS-RCN / RCN_DASformat

4 stars 1 forks source link

matrix data type float32, float16, int16,... #7

Open andreas-wuestefeld opened 2 years ago

andreas-wuestefeld commented 2 years ago

DAS data require huge storage capacity. I would like to hear community feedback on data type

float32 is probably the most flexible, allowing for wide dynamic range. But it is most memory intensive
int16 requires less memory but a scaling factor would be required

A related issue is data units: radians, strain, or nano-strain https://github.com/DAS-RCN/IRIS_DASformat/issues/6

jpmorten-asn commented 2 years ago

We have good experience using int16, the reduction in storage requirement by 50 % has significantly simplified data logistics in projects and does not reduce measurement accuracy. The scale factor is a requirement for such implementation but is a simple concept that will be obvious to most users of the format (multiplication with the factor). Moreover, it would be a good idea to include a header string value that describes the unit of the data after the scaling has been performed.

miili commented 2 years ago

Also int16/32 are easier to compress by HDF5, compared to any float. This benefits the I/O bottleneck.

I think it is worthwhile looking into HDF5 filters (i.e. scale-offset filter)

http://www.hector.ac.uk/cse/distributedcse/reports/nemo/nemo_notes/node56.html

andreas-wuestefeld commented 2 years ago

now implemented. Basically in HDF5 there is no need to predefine precision. We can thus allow either to be used, whatever the user (vendor) deems best. Scaling factor and units are implemented. Comments on variable names are welcome

dcbowden commented 1 year ago

We have seen applications where int16 affected the waveforms undesirably, i.e., the dynamic range was insufficient, especially if looking at very long-duration signals. It's possible we did not apply a scaling factor or filters optimally, however. To that end I like the current plan allowing users to define as they like. I will note that we've had trouble with float16 interchanging between matlab and h5py; there must be some underlying way the bits are encoded that the two handle differently (I guess matlab doesn't handle it natively and instead there's a wrapper function included in more recent versions?)