Speeding up loading of 3D data

JanStreffing commented 2 years ago

I recently ran a set of fairly comprehensive set of diagnostics for a 700 year long CORE2 mesh run. It took quite a few hours to complete:

-rw-r--r-- 1 streffing1 chhb19 484450 Nov  3 00:37 awicm3_mass_variable_MLD2_0.png
-rw-r--r-- 1 streffing1 chhb19 652576 Nov  3 00:37 awicm3_mass_climatology_temp_0.png
-rw-r--r-- 1 streffing1 chhb19 705780 Nov  3 00:37 awicm3_mass_climatology_temp_100.png
-rw-r--r-- 1 streffing1 chhb19 618202 Nov  3 00:37 awicm3_mass_climatology_temp_500.png
-rw-r--r-- 1 streffing1 chhb19 552784 Nov  3 00:38 awicm3_mass_climatology_temp_1000.png
-rw-r--r-- 1 streffing1 chhb19 527174 Nov  3 00:38 awicm3_mass_climatology_temp_4000.png
-rw-r--r-- 1 streffing1 chhb19 646238 Nov  3 00:38 awicm3_mass_climatology_salt_0.png
-rw-r--r-- 1 streffing1 chhb19 589696 Nov  3 00:38 awicm3_mass_climatology_salt_100.png
-rw-r--r-- 1 streffing1 chhb19 535527 Nov  3 00:39 awicm3_mass_climatology_salt_500.png
-rw-r--r-- 1 streffing1 chhb19 516883 Nov  3 00:39 awicm3_mass_climatology_salt_1000.png
-rw-r--r-- 1 streffing1 chhb19 521435 Nov  3 00:39 awicm3_mass_climatology_salt_4000.png
-rw-r--r-- 1 streffing1 chhb19 178929 Nov  3 00:40 awicm3_mass_ice_integrals_combined_icearea_combined.png
-rw-r--r-- 1 streffing1 chhb19 172107 Nov  3 00:40 awicm3_mass_ice_integrals_combined_iceext_combined.png
-rw-r--r-- 1 streffing1 chhb19 173469 Nov  3 00:40 awicm3_mass_ice_integrals_combined_icevol_combined.png
-rw-r--r-- 1 streffing1 chhb19  88539 Nov  3 00:59 awicm3_mass_hovm_difference_clim_Global_Ocean_temp.png
-rw-r--r-- 1 streffing1 chhb19  56290 Nov  3 01:19 awicm3_mass_hovm_difference_clim_Global_Ocean_salt.png
-rw-r--r-- 1 streffing1 chhb19 103724 Nov  3 01:38 awicm3_mass_hovm_difference_clim_Atlantic_Basin_temp.png
-rw-r--r-- 1 streffing1 chhb19  78564 Nov  3 01:57 awicm3_mass_hovm_difference_clim_Atlantic_Basin_salt.png
-rw-r--r-- 1 streffing1 chhb19  92288 Nov  3 02:17 awicm3_mass_hovm_difference_clim_Southern_Ocean_Basin_temp.png
-rw-r--r-- 1 streffing1 chhb19  60068 Nov  3 02:36 awicm3_mass_hovm_difference_clim_Southern_Ocean_Basin_salt.png
-rw-r--r-- 1 streffing1 chhb19 119662 Nov  3 02:37 awicm3_mass_xmoc_Global_Ocean.png
-rw-r--r-- 1 streffing1 chhb19  95797 Nov  3 02:39 awicm3_mass_xmoc_Atlantic_MOC.png
-rw-r--r-- 1 streffing1 chhb19  73936 Nov  3 02:41 awicm3_mass_xmoc_Pacific_MOC.png
-rw-r--r-- 1 streffing1 chhb19 354225 Nov  3 03:02 awicm3_mass_amoc_timeseries_26.5.png
-rw-r--r-- 1 streffing1 chhb19 310646 Nov  3 03:02 awicm3_mass_amoc_timeseries_45.png
-rw-r--r-- 1 streffing1 chhb19 202397 Nov  3 03:22 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_temp_0_100.png
-rw-r--r-- 1 streffing1 chhb19 105010 Nov  3 03:40 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_temp_100_1000.png
-rw-r--r-- 1 streffing1 chhb19  75883 Nov  3 03:59 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_temp_1000_5000.png
-rw-r--r-- 1 streffing1 chhb19 142330 Nov  3 04:18 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_salt_0_100.png
-rw-r--r-- 1 streffing1 chhb19 131947 Nov  3 04:36 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_salt_100_1000.png
-rw-r--r-- 1 streffing1 chhb19  92804 Nov  3 04:54 awicm3_mass_ocean_integrals_difference_clim_Global_Ocean_salt_1000_5000.png
-rw-r--r-- 1 streffing1 chhb19 234682 Nov  3 05:13 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_temp_0_100.png
-rw-r--r-- 1 streffing1 chhb19  99613 Nov  3 05:31 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_temp_100_1000.png
-rw-r--r-- 1 streffing1 chhb19  75593 Nov  3 05:49 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_temp_1000_5000.png
-rw-r--r-- 1 streffing1 chhb19 167603 Nov  3 06:08 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_salt_0_100.png
-rw-r--r-- 1 streffing1 chhb19  94653 Nov  3 06:27 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_salt_100_1000.png
-rw-r--r-- 1 streffing1 chhb19  78692 Nov  3 06:45 awicm3_mass_ocean_integrals_difference_clim_Atlantic_Basin_salt_1000_5000.png
-rw-r--r-- 1 streffing1 chhb19 158951 Nov  3 07:04 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_temp_0_100.png
-rw-r--r-- 1 streffing1 chhb19 125066 Nov  3 07:24 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_temp_100_1000.png
-rw-r--r-- 1 streffing1 chhb19  81634 Nov  3 07:43 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_temp_1000_5000.png
-rw-r--r-- 1 streffing1 chhb19 189303 Nov  3 08:03 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_salt_0_100.png
-rw-r--r-- 1 streffing1 chhb19 138439 Nov  3 08:22 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_salt_100_1000.png
-rw-r--r-- 1 streffing1 chhb19 127100 Nov  3 08:41 awicm3_mass_ocean_integrals_difference_clim_Southern_Ocean_Basin_salt_1000_5000.png

Most of the time is spend loading data. When we are creating global diagnostics and then local ones afterwards we read the temp and salt data in multiple times. We also read in the same data for hovm_difference_clim as we do for the ocean_integrals_difference. Reading 700 years of a 3D CORE2 field once takes about 20 minutes. Thats 22.4 GB loading with 18.6 MB/s.

Can we reduce the number of times we load 3D data from netcdf to a minimum, by grouping together diagnostics for the same area and loading only once? Or even better: If we are already loading the global field once, can we make local copies in memory copies that we cut to the local basin and loading everything only once?
Can we speed up the loading of netcdf input further than what my example is doing on juwels with CORE2 files?

koldunovn commented 2 years ago

There is a lot of room for optimisation in terms of reusing of the data, and during the design phase I had to make a choice between simple atomisation of the analysis (so each figure in the notebook can be created without additional data dependencies) and speed.

Currently I would prefer to have additional versions of diagnostics (say with HR index or something similar), that would load/compute the data once and then perform all data analysis (you don't want to compute data mean 30 times for high resolution data). So I would not overcomplicate the logic of fdiag itself, rather play around with template notebooks.

I don't know of a good way to speed up the NetCDF loading speed except converting them to zarr :)

JanStreffing commented 2 years ago

Update. I was writing on the paper and the waiting times of 15+ min for Hovmöller diagrams were too long for me. Here are some data loading experiments. I'm reading in years 1850 to 2529, so 680 years in total on core2 mesh. Volume of data on disk: 21,7 GB in total.

First the way it's done so far:

start = time.time()
print("starting")

for exp_path, exp_name  in zip(input_paths, input_names):
    data = pf.get_data(exp_path, variable, years, mesh, how=None, compute=False, silent=True)
    title = exp_name+" - "+reference_name
    hofm[title] = {}
    data_difference= pf.hovm_data(data, mesh, mask = mask) - hofm_reference.data
    hofm[title]['data'] = data_difference
    if (data_difference.max() == data_difference.min() == 0):
        hofm[title]['nodiff'] = True
    else:
        hofm[title]['nodiff'] = False
    del data
    print(exp_name)

end = time.time()
print(end - start)

Result: 1066.9s = 1,56s/year

I tried what would happen if I use CDO and remap the data to 1° on the fly while loading. I also do the fldmean and yearmean already, as needed for the hovmöller diagrams. One could select a region here as well if needed:

# Load model Data
var=['temp']
data = OrderedDict()
for v in var:
    paths = []
    data[v] = []
    for exp in tqdm.tqdm(years):
        path = input_paths[0]+'/'+v+'.fesom.'+str(exp)+'.nc'
        data[v].append(cdo.yearmean(input="-fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc "+str(path),returnArray=v))

Result: 1974,6s = 2.90s/year -> The CDO serial performance including the remapping is only twice as slow as xarray multifile load! If we would remap our data once externally with script we might already be competitive here.

I then used dask:

# Load model Data
var=['temp']
data = OrderedDict()

def load_parallel(v,path):
    data1 = cdo.yearmean(input="-fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc "+str(path),returnArray=v)
    return data1

start = time.time()
print("starting")

for v in var:
    paths = []
    datat = []
    t = []
    temporary = []
    for exp in tqdm.tqdm(years):
        path = input_paths[0]+'/'+v+'.fesom.'+str(exp)+'.nc'
        temporary = dask.delayed(load_parallel)(v,path)
        t.append(temporary)

    with ProgressBar():
        datat = dask.compute(t)
    data[v] = datat
end = time.time()
print(end - start)

Result: 316.6s = 0.46s/year

We are three times faster with dask + cdo than with xarray mf. For things like Hovmöller diagrams where the 1° remapping done in the process is good enough we should problably use this method for now. For other plots where we need the native grid, we need to look into why loading data with xarray mf is so slow, that a crude dask + cdo + remap can be faster.

JanStreffing commented 2 years ago

I also think this might be better placed on the pyfesom2 issue tracker. If you agree, here is how to move it (in theory, i've never tried this) https://hub.packtpub.com/github-now-allows-issue-transfer-between-repositories-a-public-beta-version/

koldunovn commented 2 years ago

The xarray mf load is certainly something that one would want to improve - there might be some ways that I did not explore yet. Few comments:

I believe that for climate applications remapping to regular grid is fine, but as soon as you have regular data there is a lot of efficient ways to do analysis on regular grids. The point of pyfesom2 is to give possibility to do it on the native grid.
The later does not mean we have to necessary use pyfesom2 for analysis in fdiag.
I think it's fine to have separate solutions (notebooks) for different use-cases. We then can use natural selection to see what wins, and delete the losers :) If we decide to go with fdiag to large datasets, though, I don't thing there is valuable alternative to xarray (I might be wrong). But again, different datasets might need different approaches and that's fine.
I will try to move away in the future versions from pf.get_data in favour of native xarray interface.

So in essence if you feel like having some different version of Hovmöller diagrams notebook is beneficial for you, just go ahead and try to implement it as a separate notebook, with separate call in configuration. We are not so many users right now and free to experiment and change things, and at the end the best solution should stick.

JanStreffing commented 2 years ago

Hey Nikolay, thanks for the feedback. I will try to make the Hovmöller diagnostic based on cdo loading suitable for general use then. I agree that pyfesom2 should be specialized on working with the native grid. So perhaps this is actually well placed in fdiag here.

Cheers, Jan

JanStreffing commented 2 years ago

I worked a bit more on this. Not too promising. It's much faster, but I have not yet been able to reproduce the native grid results:

xarray vs cdo:

Original routine has the peak at 5000m depth, the new one at 3000. Might have to double check the layers I use.

koldunovn commented 2 years ago

Could weighting be a problem?

JanStreffing commented 2 years ago

Maybe, but I use the normal method, after making them with genycon r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc

koldunovn commented 2 years ago

To be honest I like your figure better, as this looks like more classical warming pattern :) Consider a chance that it's me doing something very stupid with pyfesom2 :)

JanStreffing commented 2 years ago

I will say that the plot on the left looks unrealistic in the sense that the apparent error below 5000 meters is quite large in comparison to the absolute °C we have down there.

I just did something very simple. I compare the weighted means obtained by doing the remapping in on the comandline: cdo -fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc temp.fesom.2529.nc mean_r.nc This has been done many times before and should be correct. It's how pretty much everyone outside of AWI interacts with our CMIP6 data.

ref:
  18.04495, 18.00588, 17.91361, 17.72901, 17.13011, 16.64934, 16.17914, 
    15.7021, 15.24253, 14.79266, 14.35469, 13.84698, 13.16811, 12.40058, 
    11.61622, 10.70362, 9.77756, 8.897744, 8.014826, 7.127129, 6.280107, 
    5.517725, 4.84025, 4.221054, 3.692365, 3.254751, 2.893169, 2.573417, 
    2.284294, 2.006962, 1.77064, 1.587326, 1.403883, 1.256975, 1.112448, 
    0.9726256, 0.8304332, 0.6868073, 0.5523372, 0.4392726, 0.3366812, 
    0.2569414, 0.1847863, 0.118271, 0.06208421, 0.01906666, 0 ;

spinup:
  18.06586, 18.02323, 17.80249, 17.3741, 16.57906, 15.97272, 15.43916, 
    14.97799, 14.57467, 14.2087, 13.86576, 13.45534, 12.90181, 12.2354, 
    11.49265, 10.70324, 9.986268, 9.33423, 8.671644, 7.963169, 7.200276, 
    6.39399, 5.592293, 4.830609, 4.141301, 3.561179, 3.115272, 2.797192, 
    2.561603, 2.375604, 2.216084, 2.074118, 1.949526, 1.831839, 1.711576, 
    1.575909, 1.426566, 1.254321, 1.080849, 0.9068438, 0.7272307, 0.5562534, 
    0.3982435, 0.2434489, 0.1230387, 0.03524522, 0 ;

I can then use aux3d.out and get the depth levels for each of these layers. I'll be looking at the layer where these two plots have the largest difference, which is the 5400m depth level. the left plot has it at ~1° warm bias, the right plot at ~0.1°. 5400m is level number 4 from the bottom:

-4650.0
-4900.0
-5150.0
-5400.0
-5650.0
-6000.0
-6250.0

Delta T = 0.2434489 - 0.118271 = 0,1251779

This at least confirms that the phython cdo interface behaves like the comandline cdo does, and it's not some kinda of error I made in my python script. The only question left is whether the weights are correct. Maybe you can quickly check my genycon command, but I don't see how that could be wrong: cdo genycon,r360x181 -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc sst.fesom.1850.nc weights_unstr_2_r360x181.nc

koldunovn commented 2 years ago

Well, the easy way to check the bias at 5000 (rather 4000) is to plot spatial distribution maps of biases, and on my plots it's pretty large. I would be happy if you find a mistake though :)

JanStreffing commented 2 years ago

Maybe all the empty cells that are cut off by topology at depth get factored by as zeros for the fld mean? In that case it might be a question of not having set misvals correctly.

JanStreffing commented 2 years ago

I solved the remaining issue by adding: -setctomiss,0 to the cdo command chain. I will suggest to change the default FESOM2 missval.

The cdo based plots now look nearly identical. The time to load increased from 316.6s to 338.1s. Still much faster than the 1066.9s we got with xarray.

I will work on making this template merge ready tomorrow.

pgierz commented 2 years ago

Thanks for sending me the link to this Jan. I will look hopefully tomorrow. Dask-friendliess is on my list anyway.

FESOM / fdiag

Speeding up loading of 3D data #10