Open aekiss opened 1 year ago
I tried setting histaux_atm2med_file1_enabled = .true.
in the UM test , but this causes a segfault.
Segfault was user error.
Diagnostic coupler fields here have the same 32/64 bit issue as OM3
Not sure if this helps, but is histaux_atm2med_file1_history_option
compatible with the run length?
FYI
grep hist archive/output000/log/med.log
to see when data was writtenhistaux_*_flds
is a colon-delimited list of fields to output, or set it to all
to output everythingUsing this commit I've output instantaneous snapshots of all fields every timestep for a 2-step run:
/scratch/v45/aek156/access-om3/archive/MOM6-CICE6_ACCESS-OM3-a8771ad/output000/GMOM_JRA.cpl.hx.atm.1step.inst.0001-01-01-03600.nc
This includes state fields AtmImp_Sa_*
as well as fluxes AtmImp_Faxa_*
(see CMEPS field naming convention). States and fluxes are both affected by the same bug.
The input files /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.nc
all contain single-precision data:
for f in /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.1958.*.nc; do
ncdump -h $f | grep "(time, latitude, longitude)"
done
yields
float lwdn(time, latitude, longitude) ;
float prec(time, latitude, longitude) ;
float q_10(time, latitude, longitude) ;
float slp(time, latitude, longitude) ;
float swdn(time, latitude, longitude) ;
float t_10(time, latitude, longitude) ;
float u_10(time, latitude, longitude) ;
float v_10(time, latitude, longitude) ;
Not sure where the conversion from single to double precision is occurring. Could be in DATM or CMEPS I suppose.
DATM's exported fields are all double precision
The diagnostic fields are written by routine med_io_write_FB
inmed_io_mod.F90
. This has an optional argument use_float which is set true in the call to med_io_write
from med_phases_history_write_comp_aux
. Setting this false makes the variables in the diagnostic file doubles and all is ok.
Perhaps the PIO write isn't as flexible with mixed types as plain netCDF?
Aha! Thanks @MartinDix, I was also looking through that bit of code but you nailed it first.
For the record, here's the offending line: https://github.com/ESCOMP/CMEPS/blob/606eb397d4e66f8fa3417e7e8fd2b2b4b3c222b4/mediator/med_phases_history_mod.F90#L1291
So is the underlying bug in med_io_write_FB
? When luse_float
is true this does type conversion for attributes but apparently not the data itself - does it assume pio_write_darray
handles that?
If PIO passes things straight to netCDF the conversion should work. Need to have a look at the PIO tests.
We need to check if this issue is still present in the latest version of the code.
It may be the pio_initdecomp:
or the lfillvalue, https://github.com/ESCOMP/CMEPS/blob/7e0908cb958fc36002225efe00a3181f24c41c7a/mediator/med_io_mod.F90#L1061
which are both hardcoded double / real8 and don't adjust for use_float.
I suggest we just put in a patch for use_float=.false.
and not worry any more than that?
https://github.com/COSIMA/access-om3/pull/100 should fix this issue for us by using doubles, which is probably what we want to use anyway to properly check coupling is correct.
However it uses a patch, so it would be better to have this fixed upstream for when we update our CMEPS version.
I think we should post an issue on the CMEPS repo if this bug is still present on the CMEPS main branch. It still uses use_float=.true.
but I haven't checked whether float support is still buggy in med_io_write
.
I went for a patch because the real fix requires more investigation and changes - (assuming support for singles is needed). That probably doesn't preclude us raising an issue though.
Ok - ill do a build without the patch and using CMEPS main branch and see what happens.
thanks @anton-seaice
I built with CMEPS main and confirmed the error still exists:
And raised with ESCOMP: https://github.com/ESCOMP/CMEPS/issues/430
Shall we close this issue? Or park it to see if there is a CMEPS fix?
Thanks @anton-seaice let's use your patch for now but leave the issue open until we adopt a fix from upstream, at which point we can remove your patch and close the issue.
The fix for this has gone in, see https://github.com/ESCOMP/CMEPS/pull/488
we can remove the patch and test next time we update the components
There seems to be a 32/64 bit mixup in some coupler diagnostics - see https://github.com/COSIMA/access-om3/issues/40#issuecomment-1619587438