Closed Thomas-Moore-Creative closed 3 years ago
I think I found the issue and I’m 99% sure a pre-processing step will fix the problem in lazily loading an entire ACCESS-S2 reanalysis variable using python.
It seems that for most / all variables the 2015 netcdf file in the NCI collection is missing some extra file variables and associated dimensions? Example: compare ncdump for the following files looking for the ncorners
dimension and lat_bounds & lon_bounds variables.
ncdump -h /g/data/ux62/access-s2/reanalysis/ocean/u/mo_u_1981.nc
ncdump -h /g/data/ux62/access-s2/reanalysis/ocean/u/mo_u_2014.nc
ncdump -h /g/data/ux62/access-s2/reanalysis/ocean/u/mo_u_2015.nc
ncdump -h /g/data/ux62/access-s2/reanalysis/ocean/u/mo_u_2016.nc
ncdump -h /g/data/ux62/access-s2/reanalysis/ocean/u/mo_u_2018.nc
Only 2015 appears different: missing the bounds variables and extra dimensions.
For tools like xarray this inconsistency in NetCDF file structure can confuse loading operations, blowing up memory and killing workers.
This approach provides a solution until BOM updates file structure inconsistencies. https://gist.github.com/Thomas-Moore-Creative/ee5af1b6f3db9d0df0b3c3e5b7f02a7d
Attempting to build tools to load
ACCESS-S2
datasets on NCI at/g/data/ux62/access-s2/reanalysis/ocean/
xr.open_mfdataset()
Loading years 1981-2009 works fine:
And loading 2010-2018 works fine:
But loading across this timeline (or merging) results in killed workers and failures?