Closed jpolton closed 11 months ago
It looks like the xarray.dataarray.interp() is the killer:
# Subset model data in time and space: model -> obs
print('Else: 1')
mod_subset = mod_array.isel(y_dim=subset_ind[0], x_dim=subset_ind[1]).compute()
mod_subset = mod_subset.swap_dims({"t_dim": "time"})
mod_subset = mod_subset.interp(time=obs_time[ii], method=time_interp, kwargs={"fill_value": "extrapolate"}).compute()
print('Else: 2')
This routine is impossibly slow. line209: crps_sonf_moving() It loops over items in an array and then calls nearest neighbour find methods. It could be parallelised over the looped item (observation locations). However, as it currently stands, one output file is written for all observations. So parallel threads would need to be merge before writing, or an alternative file writing structure devised.
Application: Running on Jasmin: NEMO_validation/EN4_processing/surface_crps.py