We are working with the "ERA5 hourly data on pressure levels from 1979 to present" and the "ERA5 hourly data on single levels from 1979 to present" datasets provided by https://cds.climate.copernicus.eu.
time
, level
, latitude
, longitude
.time.dtype
dimension is np.datetime64
.level
dimension is given in Pascal
.longitude
dimension is ascending and within the range of [-180;180]
.latitude
dimension is ascending and within the range of [-90;90]
.temperature
, humidity
, etc.t x 37 x 720 x 1440
.t
is time in hours for each month. Roughly 24 x 30 = ~720
.The compression algorithm is an Autoencoder based one. An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name.
The performance of the presented compression algorithm is compared with zfp and SZ. zfp and SZ both an open-source library for compressed floating-point arrays that support high throughput read and write random access.
An example of how to use the compression algorithm is given in the following jupyter Notebook.
The compression algorithm consists of two functions located in compress.py and decompress.py. To compress the data just call:
from lossycomp.compress import compress
from lossycomp.decompress import decompress
compressed_data = compress(data, abs_error, verbose = True)
decompressed_data = decompress(compressed_data, verbose = True)
The compression function also outputs information about the compressed data like the achieved compression factor and the space consumption of the elements in the compressed data.
Datasets
Tensorflow
ZFP , FPZIP and SZ