Open benbovy opened 2 years ago
I think switching to fixed sized dimension should be no problem. The number of output time steps is deterministic for all possible use-cases and the netCDF API does not distinguish between appending to an unlimited dimension or writing to a fixed-length dimension. I will take a look in the next days.
After a few more tests, the issue with NaN
values for all time steps expect the 1st one seems to be that calling nf90_put_var
will not write any data in a zarr chunk that has been already initialized (not sure if that's intentional or not), so we need to ensure that each call will write in a separate chunk, e.g., by setting chunksize = 1 along the time dimension.
In addition to setting the nc_chunksizes
parameter in model.namelist
, I had to edit some code in the io_module
to make it work (also for chunking the time coordinate).
I also edited the code to set fill values using nf90_def_var_fill
instead of manually setting a missing_value
attribute (to fix Xarray warnings when reading the output zarr dataset).
I can send you a patch file if that's useful (my edits are not super clean, though).
Here's a comparison of outputs in both formats:
It looks pretty much the same (I don't know yet why the values at the domain boundary didn't get masked by Xarray with the zarr format, those values are all 1e36 while the defined fill value is 1e37).
A possible reason for the behaviour could be that the SWM is closing the dataset each time it writes to it. This is controlled by the DIAG_FLUSH macro defined in include/io.h
. Worth a try if that avoids having to set the chunksize.
A patch file would be great.
Yes indeed, DIAG_FLUSH switched off and all elements are correctly filled in a single chunk.
I have added an option for fixed sized time dimension output. The image at docker hub is already updated. Just add unlimited_tdim=.false.
to the diag_nl
namelists.
@willirath @martinclaus A few comments after having tried running the SWM model with nczarr outputs:
First, installing
libnetcdf
andnetcdf-fortran
from conda-forge seems to work well, so no need to build netcdf from source! (at least for writing zarr datasets in file directories as I don't think the conda-forge packages support S3 yet).libnetcdf
is now built with nczarr support (https://github.com/conda-forge/libnetcdf-feedstock/issues/117) andnetcdf-fortran
just forwards the path to the C library. I had to install netcdf-fortran version 4.5.3 because of this issue: https://github.com/conda-forge/netcdf-fortran-feedstock/issues/71.Then I used the following model settings in
model.namelist
(use case 1 from this repo):It didn't work out of the box as nczarr doesn't support unlimited dimensions, which is used in SWM for the time axis.
After editing the SWM source to use an arbitrary dimension size instead (a very dirty fix), I've been able to run the model until the end and read the output dataset with Xarray:
However, all stored values for time steps > 1 are
NaN
, which I guess is because simply changing an unlimited dimension to a fixed size dimension is not enough (I'm not familiar with the netcdf C/Fortran APs, though). @martinclaus do you think it would be possible to avoid using unlimited dimensions in SWM without refactoring its internals too heavily?