COSIMA / access-om3

ACCESS-OM3 global ocean-sea ice-wave coupled model
13 stars 6 forks source link

Bad coupler diagnostics #44

Open aekiss opened 1 year ago

aekiss commented 1 year ago

There seems to be a 32/64 bit mixup in some coupler diagnostics - see https://github.com/COSIMA/access-om3/issues/40#issuecomment-1619587438

MartinDix commented 1 year ago

I tried setting histaux_atm2med_file1_enabled = .true. in the UM test , but this causes a segfault.

Segfault was user error.

Diagnostic coupler fields here have the same 32/64 bit issue as OM3

aekiss commented 1 year ago

Not sure if this helps, but is histaux_atm2med_file1_history_option compatible with the run length?

aekiss commented 1 year ago

FYI

aekiss commented 1 year ago

Using this commit I've output instantaneous snapshots of all fields every timestep for a 2-step run:

/scratch/v45/aek156/access-om3/archive/MOM6-CICE6_ACCESS-OM3-a8771ad/output000/GMOM_JRA.cpl.hx.atm.1step.inst.0001-01-01-03600.nc

This includes state fields AtmImp_Sa_* as well as fluxes AtmImp_Faxa_* (see CMEPS field naming convention). States and fluxes are both affected by the same bug.

aekiss commented 1 year ago

The input files /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.nc all contain single-precision data:

for f in /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.1958.*.nc; do
   ncdump -h $f | grep "(time, latitude, longitude)"
done

yields

    float lwdn(time, latitude, longitude) ;
    float prec(time, latitude, longitude) ;
    float q_10(time, latitude, longitude) ;
    float slp(time, latitude, longitude) ;
    float swdn(time, latitude, longitude) ;
    float t_10(time, latitude, longitude) ;
    float u_10(time, latitude, longitude) ;
    float v_10(time, latitude, longitude) ;

Not sure where the conversion from single to double precision is occurring. Could be in DATM or CMEPS I suppose.

aekiss commented 1 year ago

DATM's exported fields are all double precision

MartinDix commented 1 year ago

The diagnostic fields are written by routine med_io_write_FB inmed_io_mod.F90. This has an optional argument use_float which is set true in the call to med_io_write from med_phases_history_write_comp_aux. Setting this false makes the variables in the diagnostic file doubles and all is ok.

Perhaps the PIO write isn't as flexible with mixed types as plain netCDF?

aekiss commented 1 year ago

Aha! Thanks @MartinDix, I was also looking through that bit of code but you nailed it first.

For the record, here's the offending line: https://github.com/ESCOMP/CMEPS/blob/606eb397d4e66f8fa3417e7e8fd2b2b4b3c222b4/mediator/med_phases_history_mod.F90#L1291

aekiss commented 1 year ago

So is the underlying bug in med_io_write_FB? When luse_float is true this does type conversion for attributes but apparently not the data itself - does it assume pio_write_darray handles that?

MartinDix commented 1 year ago

If PIO passes things straight to netCDF the conversion should work. Need to have a look at the PIO tests.

micaeljtoliveira commented 1 year ago

We need to check if this issue is still present in the latest version of the code.

anton-seaice commented 7 months ago

It may be the pio_initdecomp:

https://github.com/ESCOMP/CMEPS/blob/7e0908cb958fc36002225efe00a3181f24c41c7a/mediator/med_io_mod.F90#L1026

or the lfillvalue, https://github.com/ESCOMP/CMEPS/blob/7e0908cb958fc36002225efe00a3181f24c41c7a/mediator/med_io_mod.F90#L1061

which are both hardcoded double / real8 and don't adjust for use_float.

I suggest we just put in a patch for use_float=.false. and not worry any more than that?

aekiss commented 7 months ago

https://github.com/COSIMA/access-om3/pull/100 should fix this issue for us by using doubles, which is probably what we want to use anyway to properly check coupling is correct.

However it uses a patch, so it would be better to have this fixed upstream for when we update our CMEPS version.

I think we should post an issue on the CMEPS repo if this bug is still present on the CMEPS main branch. It still uses use_float=.true. but I haven't checked whether float support is still buggy in med_io_write.

anton-seaice commented 7 months ago

I went for a patch because the real fix requires more investigation and changes - (assuming support for singles is needed). That probably doesn't preclude us raising an issue though.

Ok - ill do a build without the patch and using CMEPS main branch and see what happens.

aekiss commented 7 months ago

thanks @anton-seaice

anton-seaice commented 7 months ago

I built with CMEPS main and confirmed the error still exists:

Screenshot 2024-02-19 at 10 20 43 am

And raised with ESCOMP: https://github.com/ESCOMP/CMEPS/issues/430

Shall we close this issue? Or park it to see if there is a CMEPS fix?

aekiss commented 7 months ago

Thanks @anton-seaice let's use your patch for now but leave the issue open until we adopt a fix from upstream, at which point we can remove your patch and close the issue.

anton-seaice commented 2 months ago

The fix for this has gone in, see https://github.com/ESCOMP/CMEPS/pull/488

we can remove the patch and test next time we update the components