Closed pgierz closed 4 months ago
@koldunovn The way to get in zstd, is to just compile netcdf and hdf5 in the correct way, as described here: https://github.com/FESOM/FESOM_compression/blob/main/README.md, correct?
Works and IMO can be merged as is. List of Benchmarks:
CORE2, 128 nodes, monthly 3D output plus some daily 2D 1 month, no compression, 7m25s 1 month, compression level 9, 7m36s
@pgierz, can you add the outdata volumes?
More tests at larger number of nodes and with bigger meshes later.
https://github.com/FESOM/FESOM_compression/blob/main/README.md
Not sure it's the right reference, but my understanding, yes, you have to build netCDF that supports new "filters" with zstd support. Might be relevant: https://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2022/msg00032.html
@JanStreffing here you go:
For all outputs:
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2
❯ du -sc result_tmp
5779216 result_tmp
5779216 total
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2
❯ du -sc result_tmp_no_compress/
5835916 result_tmp_no_compress/
5835916 total
And individually:
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2 took 59s
❯ ls -ratl result_tmp
total 149160
-rw-r--r-- 1 a270077 ab0246 98289786 May 13 16:12 fesom.mesh.diag.nc
drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:19 fesom_raw_restart
drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:19 fesom_bin_restart
drwxr-sr-x 6 a270077 ab0246 4096 May 13 16:19 .
-rw-r--r-- 1 a270077 ab0246 102 May 13 16:19 fesom.clock
drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:19 fesom.1958.oce.restart
drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:19 fesom.1958.ice.restart
-rw-r--r-- 1 a270077 ab0246 4333826 May 13 16:19 uice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 931056 May 13 16:19 ty_sur.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 930926 May 13 16:19 tx_sur.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 474180 May 13 16:19 MLD3.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 471431 May 13 16:19 MLD2.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 169114 May 13 16:19 m_ice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 4338172 May 13 16:19 vice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 13735260 May 13 16:19 temp.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 470354 May 13 16:19 sst.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 426784 May 13 16:19 sss.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 482735 May 13 16:19 ssh.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 25812750 May 13 16:19 salt.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 173484 May 13 16:19 m_snow.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 468782 May 13 16:19 MLD1.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 493660 May 13 16:19 fw.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 489496 May 13 16:19 fh.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 164197 May 13 16:19 a_ice.fesom.1958.nc
drwxr-sr-x 22 a270077 ab0246 4096 May 13 16:22 ..
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2
❯ ls -ratl result_tmp_no_compress/
total 205848
drwxr-sr-x 22 a270077 ab0246 4096 May 13 16:22 ..
-rw-r--r-- 1 a270077 ab0246 98289786 May 13 16:23 fesom.mesh.diag.nc
drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:30 fesom_raw_restart
-rw-r--r-- 1 a270077 ab0246 102 May 13 16:30 fesom.clock
drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:30 fesom_bin_restart
drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:30 fesom.1958.oce.restart
drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:30 fesom.1958.ice.restart
drwxr-sr-x 6 a270077 ab0246 4096 May 13 16:30 .
-rw-r--r-- 1 a270077 ab0246 15752685 May 13 16:30 vice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 15752685 May 13 16:30 uice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 999617 May 13 16:30 ty_sur.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 999607 May 13 16:30 tx_sur.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 24383368 May 13 16:30 temp.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528361 May 13 16:30 sst.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528357 May 13 16:30 sss.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528357 May 13 16:30 ssh.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 48742784 May 13 16:30 salt.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528337 May 13 16:30 m_snow.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD3.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD2.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD1.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528335 May 13 16:30 m_ice.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 fw.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528333 May 13 16:30 fh.fesom.1958.nc
-rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 a_ice.fesom.1958.nc
@pgierz So, this is just standard netCDF compression (zlib
)?
@JanStreffing level 9 is too agressive, I think you can have good results already even with 1, and maybe 3 is optimal, but need some experimenting :)
@JanStreffing here you go:
For all outputs:
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2 ❯ du -sc result_tmp 5779216 result_tmp 5779216 total a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2 ❯ du -sc result_tmp_no_compress/ 5835916 result_tmp_no_compress/ 5835916 total
And individually:
a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2 took 59s ❯ ls -ratl result_tmp total 149160 -rw-r--r-- 1 a270077 ab0246 98289786 May 13 16:12 fesom.mesh.diag.nc drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:19 fesom_raw_restart drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:19 fesom_bin_restart drwxr-sr-x 6 a270077 ab0246 4096 May 13 16:19 . -rw-r--r-- 1 a270077 ab0246 102 May 13 16:19 fesom.clock drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:19 fesom.1958.oce.restart drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:19 fesom.1958.ice.restart -rw-r--r-- 1 a270077 ab0246 4333826 May 13 16:19 uice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 931056 May 13 16:19 ty_sur.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 930926 May 13 16:19 tx_sur.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 474180 May 13 16:19 MLD3.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 471431 May 13 16:19 MLD2.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 169114 May 13 16:19 m_ice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 4338172 May 13 16:19 vice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 13735260 May 13 16:19 temp.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 470354 May 13 16:19 sst.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 426784 May 13 16:19 sss.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 482735 May 13 16:19 ssh.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 25812750 May 13 16:19 salt.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 173484 May 13 16:19 m_snow.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 468782 May 13 16:19 MLD1.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 493660 May 13 16:19 fw.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 489496 May 13 16:19 fh.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 164197 May 13 16:19 a_ice.fesom.1958.nc drwxr-sr-x 22 a270077 ab0246 4096 May 13 16:22 .. a270077 in 🌐 levante0 in fesom2 on refactoring-compress [!?] via △ v3.20.2 ❯ ls -ratl result_tmp_no_compress/ total 205848 drwxr-sr-x 22 a270077 ab0246 4096 May 13 16:22 .. -rw-r--r-- 1 a270077 ab0246 98289786 May 13 16:23 fesom.mesh.diag.nc drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:30 fesom_raw_restart -rw-r--r-- 1 a270077 ab0246 102 May 13 16:30 fesom.clock drwxr-sr-x 3 a270077 ab0246 4096 May 13 16:30 fesom_bin_restart drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:30 fesom.1958.oce.restart drwxr-sr-x 2 a270077 ab0246 4096 May 13 16:30 fesom.1958.ice.restart drwxr-sr-x 6 a270077 ab0246 4096 May 13 16:30 . -rw-r--r-- 1 a270077 ab0246 15752685 May 13 16:30 vice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 15752685 May 13 16:30 uice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 999617 May 13 16:30 ty_sur.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 999607 May 13 16:30 tx_sur.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 24383368 May 13 16:30 temp.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528361 May 13 16:30 sst.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528357 May 13 16:30 sss.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528357 May 13 16:30 ssh.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 48742784 May 13 16:30 salt.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528337 May 13 16:30 m_snow.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD3.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD2.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 MLD1.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528335 May 13 16:30 m_ice.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 fw.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528333 May 13 16:30 fh.fesom.1958.nc -rw-r--r-- 1 a270077 ab0246 528349 May 13 16:30 a_ice.fesom.1958.nc
Could you remove the restarts first?
@pgierz So, this is just standard netCDF compression (
zlib
)? @JanStreffing level 9 is too agressive, I think you can have good results already even with 1, and maybe 3 is optimal, but need some experimenting :)
agreed, I will test with level 1, which is what I use for OpenIFS.
Could you remove the restarts first?
Sure, without restart files:
du -sc result_tmp result_tmp_no_compress
149140 result_tmp
205828 result_tmp_no_compress
354968 total
And once graphically:
…utput
Tag to @JanStreffing for more work on this.
Basics: about ~50% less disk space for ~10% more wall time, subject to mesh choices, scalability, etc.