Closed pmav99 closed 1 year ago
Figuring this out is kind of important for the ERA5 hindcast run, where the file sizes may get really big. It might have some importance on the forecast runs, too (less data transfered to/from the cluster, less data read, less RAM).
OK, I tested this. We can't use float16 because it is not supported by NetCDF4 library/standard. This is the traceback:
TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got float16
Nevertheless float32 seems to be working fine. At least, I run the iceland model and the results seem to be identical. I will push the fix.
I'll wait on @brey to get his opinion on this but I'd keep them. or at least stmp stmp is Surface Temperature. If float 32 keeps the 1st decimal, ok. spfh is specific humidity. not sure if it is useful @brey?
My understanding is that neither one of these two variables is being used at the moment. Nevertheless SCHISM requires them to be present in the input netcdf file. sflux_air
is the relevant file:
https://schism-dev.github.io/schism/master/input-output/sflux.html
Therefore we just add them in the netcdf and we put zeroes. https://github.com/ec-jrc/pyPoseidon/blob/edce2cddbdf29cb0019c5e52ef8a5a05bb3a8c81/pyposeidon/schism.py#L287-L307
By using float64 instead of float32 we doubled the size of these variables. In practical terms, it was like using an input file with 7 variables instead of 5 (40% increase). For the forecasts this is not an issue, but for the hindcast runs with the ERA5 file size (i.e. disk space) can be a limiting factor.
I believe I discussed this with George at some point in the past, but I think we were only discussing if it is possible to completely remove the variables from the input file. Not changing their data type. Nevertheless, we should indeed wait for his take on this.
pinging @vvoukouvalas , too.
Indeed the variables are fixed by SCHISM but not needed by us, thus zero. For context see https://github.com/schism-dev/schism/issues/16.
I don't expect problems setting them as float32.
AFAIK
spfh
andstmp
are just zeroes, right?Is there some reason we use float64 for them? If the values are just zeroes, then it may be possible to reduce the size of the sflux files by using float16 instead. Even with float32 there would be some gains.
I assume we get the float64 because this is the default value for
dask.array.zeros()
:https://github.com/ec-jrc/pyPoseidon/blob/edce2cddbdf29cb0019c5e52ef8a5a05bb3a8c81/pyposeidon/schism.py#L287
We can either specify the dtype directly in
.zeros()
or use.zeros_like()
: https://docs.dask.org/en/stable/generated/dask.array.zeros_like.html