Open Thomas-Moore-Creative opened 6 months ago
for the 7.26GB 2D output files the following code ( level 5 ) compresses to 3.3GB
def compress_nc(ds,out_path):
compression_opts = {
'zlib': True, # Enable zlib compression
'complevel': 5, # Compression level (1-9)
}
# Set the encoding for each variable
encoding = {var: compression_opts for var in ds.data_vars}
# Write the dataset to a new NetCDF file with compression
ds.to_netcdf(out_path, encoding=encoding)
this took : Wall time: 3min 27s
compression = 9 was Wall time: 11min 34s
and size was 3.2GB
compression = 1 was Wall time: 2min 45s
and size was 3.3GB
in this notebook: compress_netcdf
there's the following function:
def write_compressed_netcdf(dataset,file_path,compression_level=4):
encoding = {}
for var_name in dataset.data_vars:
encoding[var_name] = {'zlib': True, 'complevel': compression_level}
dataset.to_netcdf(file_path, encoding=encoding)
NB: @matt-csiro approach for chunking 3D BRAN2020 is:
xt_ocean
= 300
yt_ocean
= 300
st_ocean
= -1
Time
= -1
/g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_intermediate_results
(base) tm4888@gadi-login-09 /g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_intermediate_results ls -l *.nc
-rw-r--r-- 1 tm4888 es60 483373521 May 20 14:34 BRAN2020_base_stats_eta_t_alltime_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 494279790 May 20 15:05 BRAN2020_base_stats_eta_t_el_nino_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 490290307 May 20 14:55 BRAN2020_base_stats_eta_t_la_nina_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 487237913 May 27 11:33 BRAN2020_base_stats_eta_t_neutral_2024.05.22.11.07.01.nc
-rw-r--r-- 1 tm4888 es60 474453729 May 20 14:19 BRAN2020_base_stats_mld_alltime_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 499982008 May 20 14:50 BRAN2020_base_stats_mld_el_nino_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 493199928 May 20 14:40 BRAN2020_base_stats_mld_la_nina_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 486271869 May 27 11:33 BRAN2020_base_stats_mld_neutral_2024.05.22.09.55.36.nc
-rw-r--r-- 1 tm4888 es60 17135943610 May 18 15:22 BRAN2020_base_stats_salt_alltime_2024.05.18.09.19.14.nc
-rw-r--r-- 1 tm4888 es60 17324048046 May 19 14:29 BRAN2020_base_stats_salt_el_nino_2024.05.18.20.03.31.nc
-rw-r--r-- 1 tm4888 es60 17261157454 May 19 07:08 BRAN2020_base_stats_salt_la_nina_2024.05.18.20.03.31.nc
-rw-r--r-- 1 tm4888 es60 17158352065 May 27 11:33 BRAN2020_base_stats_salt_neutral_2024.05.22.02.52.35.nc
-rw-r--r-- 1 tm4888 es60 19143299023 May 18 04:42 BRAN2020_base_stats_temp_alltime_2024.05.18.00.05.08.nc
-rw-r--r-- 1 tm4888 es60 19590262769 May 19 17:26 BRAN2020_base_stats_temp_el_nino_2024.05.19.13.12.37.nc
-rw-r--r-- 1 tm4888 es60 19445789296 May 19 06:54 BRAN2020_base_stats_temp_la_nina_2024.05.18.18.54.38.nc
-rw-r--r-- 1 tm4888 es60 19343013301 May 27 11:34 BRAN2020_base_stats_temp_neutral_2024.05.22.06.24.27.nc
-rw-r--r-- 1 tm4888 es60 23791916518 May 18 15:41 BRAN2020_base_stats_u_alltime_2024.05.18.10.35.04.nc
-rw-r--r-- 1 tm4888 es60 23966192982 May 20 07:18 BRAN2020_base_stats_u_el_nino_2024.05.20.03.19.04.nc
-rw-r--r-- 1 tm4888 es60 23915150415 May 20 05:12 BRAN2020_base_stats_u_la_nina_2024.05.20.00.30.07.nc
-rw-r--r-- 1 tm4888 es60 23897851585 May 27 11:35 BRAN2020_base_stats_u_neutral_2024.05.21.22.21.38.nc
-rw-r--r-- 1 tm4888 es60 24381911568 May 18 15:47 BRAN2020_base_stats_v_alltime_2024.05.18.10.42.35.nc
-rw-r--r-- 1 tm4888 es60 24334060247 May 20 07:21 BRAN2020_base_stats_v_el_nino_2024.05.20.03.13.51.nc
-rw-r--r-- 1 tm4888 es60 24341457122 May 20 07:11 BRAN2020_base_stats_v_la_nina_2024.05.20.03.08.03.nc
-rw-r--r-- 1 tm4888 es60 24383438225 May 27 11:36 BRAN2020_base_stats_v_neutral_2024.05.22.11.17.59.nc
-rw-r--r-- 1 tm4888 es60 361580012 May 20 14:38 BRAN2020_quantile_stats_eta_t_alltime_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 371776318 May 20 15:08 BRAN2020_quantile_stats_eta_t_el_nino_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 369596738 May 20 14:58 BRAN2020_quantile_stats_eta_t_la_nina_2024.05.20.14.30.28.nc
-rw-r--r-- 1 tm4888 es60 366665725 May 27 11:36 BRAN2020_quantile_stats_eta_t_neutral_2024.05.22.11.07.01.nc
-rw-r--r-- 1 tm4888 es60 371868199 May 20 14:23 BRAN2020_quantile_stats_mld_alltime_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 385676782 May 20 14:53 BRAN2020_quantile_stats_mld_el_nino_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 382437241 May 20 14:43 BRAN2020_quantile_stats_mld_la_nina_2024.05.20.14.15.51.nc
-rw-r--r-- 1 tm4888 es60 378458416 May 27 11:36 BRAN2020_quantile_stats_mld_neutral_2024.05.22.09.55.36.nc
-rw-r--r-- 1 tm4888 es60 9916982735 May 18 18:42 BRAN2020_quantile_stats_salt_alltime_2024.05.18.09.19.14.nc
-rw-r--r-- 1 tm4888 es60 11287376016 May 19 18:02 BRAN2020_quantile_stats_salt_el_nino_2024.05.18.20.03.31.nc
-rw-r--r-- 1 tm4888 es60 11054439201 May 19 10:41 BRAN2020_quantile_stats_salt_la_nina_2024.05.18.20.03.31.nc
-rw-r--r-- 1 tm4888 es60 10825667081 May 27 11:37 BRAN2020_quantile_stats_salt_neutral_2024.05.22.02.52.35.nc
-rw-r--r-- 1 tm4888 es60 11149967176 May 18 07:35 BRAN2020_quantile_stats_temp_alltime_2024.05.18.00.05.08.nc
-rw-r--r-- 1 tm4888 es60 12813681330 May 19 21:36 BRAN2020_quantile_stats_temp_el_nino_2024.05.19.13.12.37.nc
-rw-r--r-- 1 tm4888 es60 12550705334 May 19 16:45 BRAN2020_quantile_stats_temp_la_nina_2024.05.19.12.59.34.nc
-rw-r--r-- 1 tm4888 es60 12275763465 May 27 11:37 BRAN2020_quantile_stats_temp_neutral_2024.05.22.06.24.27.nc
-rw-r--r-- 1 tm4888 es60 15794470250 May 18 19:36 BRAN2020_quantile_stats_u_alltime_2024.05.18.10.35.04.nc
-rw-r--r-- 1 tm4888 es60 16899293259 May 20 01:48 BRAN2020_quantile_stats_u_el_nino_2024.05.19.21.37.51.nc
-rw-r--r-- 1 tm4888 es60 16853389695 May 20 00:53 BRAN2020_quantile_stats_u_la_nina_2024.05.19.21.11.12.nc
-rw-r--r-- 1 tm4888 es60 16674048388 May 27 11:38 BRAN2020_quantile_stats_u_neutral_2024.05.21.22.21.38.nc
-rw-r--r-- 1 tm4888 es60 15682542436 May 18 19:13 BRAN2020_quantile_stats_v_alltime_2024.05.18.10.42.35.nc
-rw-r--r-- 1 tm4888 es60 16905918949 May 20 11:43 BRAN2020_quantile_stats_v_el_nino_2024.05.20.07.57.29.nc
-rw-r--r-- 1 tm4888 es60 16813994768 May 20 12:01 BRAN2020_quantile_stats_v_la_nina_2024.05.20.07.49.04.nc
-rw-r--r-- 1 tm4888 es60 16605999410 May 27 11:39 BRAN2020_quantile_stats_v_neutral_2024.05.22.11.17.59.nc
zlib 5
when written at intermediate leveldict_keys(['temp_alltime_ds', 'temp_neutral_ds', 'temp_la_nina_ds', 'temp_el_nino_ds',
'salt_alltime_ds','salt_neutral_ds', 'salt_la_nina_ds', 'salt_el_nino_ds', 'u_alltime_ds',
'u_neutral_ds', 'u_la_nina_ds','u_el_nino_ds', 'v_alltime_ds', 'v_neutral_ds',
'v_la_nina_ds', 'v_el_nino_ds', 'eta_t_alltime_ds', 'eta_t_neutral_ds','eta_t_la_nina_ds',
'eta_t_el_nino_ds', 'mld_alltime_ds', 'mld_neutral_ds', 'mld_la_nina_ds', 'mld_el_nino_ds'])
# Calculate the total size of all datasets in the dictionary
total_size_gb = sum(merged_dataset.nbytes / (1024**3) for merged_dataset in merged_datasets.values())
print(f"Total size of all datasets: {total_size_gb} GB")
Total size of all datasets: 1991.0576639771461 GB
`/g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_intermediate_results du -hsc *.nc` = 532G total
3D 2D
@matt-csiro - my first attempt to write to netcdf
2D `mld @ 1,300,300 chunks grinds along at a snails pace. What took 3 minutes is 2% finished in 10 minutes. I could likely be doing this differently . . .
Would you normally use cdo
or nco
? I'm guessing reading things into memory is required for these approaches but would speed things up a lot writing out to many tiny chunks?
Hmm, that's interesting.
NCOs have been my tools for handling chunking and which typically haven't been so sensitive to chunk sizes.
@matt-csiro : as expected if I do everything IN MEMORY then writing all these tiny chunks isn't much of an issue
Any response on their requests?
mld_tiny_chunk = mld_tiny_chunk.compute() # data must be IN MEMORY
encoding = {} #setup encoding dict
chunksizes_tuple = (1, 300, 300) #set chunksizes for netcdf write
for var_name in mld_tiny_chunk.data_vars:
encoding[var_name] = {'zlib': True, 'complevel': 5, 'dtype': 'float32', 'chunksizes': chunksizes_tuple} # encode only the data variables
# Save to NetCDF with chunking and compression encoding
mld_tiny_chunk.to_netcdf('/g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_final_results/mld_01300300.nc',
engine='netcdf4',encoding=encoding)
need to write batch code to combine the intermediate netcdf
files into one per core variable