Open JanStreffing opened 2 years ago
There is a lot of room for optimisation in terms of reusing of the data, and during the design phase I had to make a choice between simple atomisation of the analysis (so each figure in the notebook can be created without additional data dependencies) and speed.
Currently I would prefer to have additional versions of diagnostics (say with HR index or something similar), that would load/compute the data once and then perform all data analysis (you don't want to compute data mean 30 times for high resolution data). So I would not overcomplicate the logic of fdiag itself, rather play around with template notebooks.
I don't know of a good way to speed up the NetCDF loading speed except converting them to zarr :)
Update. I was writing on the paper and the waiting times of 15+ min for Hovmöller diagrams were too long for me. Here are some data loading experiments. I'm reading in years 1850 to 2529, so 680 years in total on core2 mesh. Volume of data on disk: 21,7 GB in total.
First the way it's done so far:
start = time.time()
print("starting")
for exp_path, exp_name in zip(input_paths, input_names):
data = pf.get_data(exp_path, variable, years, mesh, how=None, compute=False, silent=True)
title = exp_name+" - "+reference_name
hofm[title] = {}
data_difference= pf.hovm_data(data, mesh, mask = mask) - hofm_reference.data
hofm[title]['data'] = data_difference
if (data_difference.max() == data_difference.min() == 0):
hofm[title]['nodiff'] = True
else:
hofm[title]['nodiff'] = False
del data
print(exp_name)
end = time.time()
print(end - start)
Result: 1066.9s = 1,56s/year
I tried what would happen if I use CDO and remap the data to 1° on the fly while loading. I also do the fldmean and yearmean already, as needed for the hovmöller diagrams. One could select a region here as well if needed:
# Load model Data
var=['temp']
data = OrderedDict()
for v in var:
paths = []
data[v] = []
for exp in tqdm.tqdm(years):
path = input_paths[0]+'/'+v+'.fesom.'+str(exp)+'.nc'
data[v].append(cdo.yearmean(input="-fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc "+str(path),returnArray=v))
Result: 1974,6s = 2.90s/year -> The CDO serial performance including the remapping is only twice as slow as xarray multifile load! If we would remap our data once externally with script we might already be competitive here.
I then used dask:
# Load model Data
var=['temp']
data = OrderedDict()
def load_parallel(v,path):
data1 = cdo.yearmean(input="-fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc "+str(path),returnArray=v)
return data1
start = time.time()
print("starting")
for v in var:
paths = []
datat = []
t = []
temporary = []
for exp in tqdm.tqdm(years):
path = input_paths[0]+'/'+v+'.fesom.'+str(exp)+'.nc'
temporary = dask.delayed(load_parallel)(v,path)
t.append(temporary)
with ProgressBar():
datat = dask.compute(t)
data[v] = datat
end = time.time()
print(end - start)
Result: 316.6s = 0.46s/year
We are three times faster with dask + cdo than with xarray mf. For things like Hovmöller diagrams where the 1° remapping done in the process is good enough we should problably use this method for now. For other plots where we need the native grid, we need to look into why loading data with xarray mf is so slow, that a crude dask + cdo + remap can be faster.
I also think this might be better placed on the pyfesom2 issue tracker. If you agree, here is how to move it (in theory, i've never tried this) https://hub.packtpub.com/github-now-allows-issue-transfer-between-repositories-a-public-beta-version/
The xarray
mf load is certainly something that one would want to improve - there might be some ways that I did not explore yet. Few comments:
fdiag
.pf.get_data
in favour of native xarray interface.So in essence if you feel like having some different version of Hovmöller diagrams notebook is beneficial for you, just go ahead and try to implement it as a separate notebook, with separate call in configuration. We are not so many users right now and free to experiment and change things, and at the end the best solution should stick.
Hey Nikolay, thanks for the feedback. I will try to make the Hovmöller diagnostic based on cdo loading suitable for general use then. I agree that pyfesom2 should be specialized on working with the native grid. So perhaps this is actually well placed in fdiag here.
Cheers, Jan
I worked a bit more on this. Not too promising. It's much faster, but I have not yet been able to reproduce the native grid results:
xarray vs cdo:
Original routine has the peak at 5000m depth, the new one at 3000. Might have to double check the layers I use.
Could weighting be a problem?
Maybe, but I use the normal method, after making them with genycon
r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc
To be honest I like your figure better, as this looks like more classical warming pattern :) Consider a chance that it's me doing something very stupid with pyfesom2 :)
I will say that the plot on the left looks unrealistic in the sense that the apparent error below 5000 meters is quite large in comparison to the absolute °C we have down there.
I just did something very simple. I compare the weighted means obtained by doing the remapping in on the comandline:
cdo -fldmean -remap,r360x181,/p/project/chhb19/streffing1/input/fesom2/core2/weights_unstr_2_r360x181.nc -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc temp.fesom.2529.nc mean_r.nc
This has been done many times before and should be correct. It's how pretty much everyone outside of AWI interacts with our CMIP6 data.
ref:
18.04495, 18.00588, 17.91361, 17.72901, 17.13011, 16.64934, 16.17914,
15.7021, 15.24253, 14.79266, 14.35469, 13.84698, 13.16811, 12.40058,
11.61622, 10.70362, 9.77756, 8.897744, 8.014826, 7.127129, 6.280107,
5.517725, 4.84025, 4.221054, 3.692365, 3.254751, 2.893169, 2.573417,
2.284294, 2.006962, 1.77064, 1.587326, 1.403883, 1.256975, 1.112448,
0.9726256, 0.8304332, 0.6868073, 0.5523372, 0.4392726, 0.3366812,
0.2569414, 0.1847863, 0.118271, 0.06208421, 0.01906666, 0 ;
spinup:
18.06586, 18.02323, 17.80249, 17.3741, 16.57906, 15.97272, 15.43916,
14.97799, 14.57467, 14.2087, 13.86576, 13.45534, 12.90181, 12.2354,
11.49265, 10.70324, 9.986268, 9.33423, 8.671644, 7.963169, 7.200276,
6.39399, 5.592293, 4.830609, 4.141301, 3.561179, 3.115272, 2.797192,
2.561603, 2.375604, 2.216084, 2.074118, 1.949526, 1.831839, 1.711576,
1.575909, 1.426566, 1.254321, 1.080849, 0.9068438, 0.7272307, 0.5562534,
0.3982435, 0.2434489, 0.1230387, 0.03524522, 0 ;
I can then use aux3d.out and get the depth levels for each of these layers. I'll be looking at the layer where these two plots have the largest difference, which is the 5400m depth level. the left plot has it at ~1° warm bias, the right plot at ~0.1°. 5400m is level number 4 from the bottom:
-4650.0
-4900.0
-5150.0
-5400.0
-5650.0
-6000.0
-6250.0
Delta T = 0.2434489 - 0.118271 = 0,1251779
This at least confirms that the phython cdo interface behaves like the comandline cdo does, and it's not some kinda of error I made in my python script. The only question left is whether the weights are correct. Maybe you can quickly check my genycon command, but I don't see how that could be wrong:
cdo genycon,r360x181 -setgrid,/p/project/chhb19/streffing1/input/fesom2/core2/CORE2_finaltopo_mean.nc sst.fesom.1850.nc weights_unstr_2_r360x181.nc
Well, the easy way to check the bias at 5000 (rather 4000) is to plot spatial distribution maps of biases, and on my plots it's pretty large. I would be happy if you find a mistake though :)
Maybe all the empty cells that are cut off by topology at depth get factored by as zeros for the fld mean? In that case it might be a question of not having set misvals correctly.
I solved the remaining issue by adding: -setctomiss,0
to the cdo command chain. I will suggest to change the default FESOM2 missval.
The cdo based plots now look nearly identical. The time to load increased from 316.6s to 338.1s. Still much faster than the 1066.9s we got with xarray.
I will work on making this template merge ready tomorrow.
Thanks for sending me the link to this Jan. I will look hopefully tomorrow. Dask-friendliess is on my list anyway.
I recently ran a set of fairly comprehensive set of diagnostics for a 700 year long CORE2 mesh run. It took quite a few hours to complete:
Most of the time is spend loading data. When we are creating global diagnostics and then local ones afterwards we read the temp and salt data in multiple times. We also read in the same data for
hovm_difference_clim
as we do for theocean_integrals_difference
. Reading 700 years of a 3D CORE2 field once takes about 20 minutes. Thats 22.4 GB loading with 18.6 MB/s.