Bad ERA-5 wind forcing data at several times/locations causes 0.25° crash

aekiss commented 1 year ago

@rmholmes is getting a Free surface penetrating rock error just after 1984-08-11T19:00 in a 0.25° config forced by ERA5 and JRA55-do v1.5 runoff. This does not occur with the 1° version of this config.

Errors like this have been resolved in the all-JRA55-do configs by reducing the timestep (e.g. from 540s to 360s to fix a crash just after 1988-09-27T06:00 in ACCESS-OM2-01 IAF), but Ryan has tried reducing dt from 1200s to 100s to no avail.

aekiss commented 1 year ago

It's a long shot, but the Orinoco outflow (white) peaks about when the crash occurs. We might need to set a regional runoff cap in atmosphere/atm.nml. But it's less than the Amazon runoff (red) so might not be a problem. Plot below is from /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-5-0/land/day/friver/gr/v20200916/friver_input4MIPs_atmosphericState_OMIP_MRI-JRA55-do-1-5-0_gr_19840101-19841231.nc. Screenshot 2023-05-12 at 9 19 01 am

rmholmes commented 1 year ago

Thanks @aekiss.

Runoff is a good candidate as going to JRA55 v1.5 runoff is the only thing that has changed in the forcing compared to the successful 1deg_era5_iaf simulation (using v1.4 runoff) I've done before. I just plotted some maximums of friver and licalvf and nothing looks crazy. But it's worth trying. Do you have an example of a runoff cap?

aekiss commented 1 year ago

We use runoff caps in the 0.1° configs: https://github.com/COSIMA/01deg_jra55_iaf/blob/master/atmosphere/atm.nml See documentation in code: https://github.com/COSIMA/libaccessom2/blob/d750b4bfdc58c59490985c682c1b4c56cc1016b1/atm/src/runoff.F90#L24-L35

rmholmes commented 1 year ago

Is it reasonable to just turnoff runoff altogether as a quick test to see if this is really what is causing the problem?

aekiss commented 1 year ago

The relevant restart tiles from the day before are
/home/561/rmh561/access-om2/025deg_era5_iaf/archive/restart005/ocean/*.0015. They seem OK in this location so it may be a red herring. e.g. these plots from ocean_sbc.res.nc.0015. Note that sea_lev is high in the outflow region rather than low. Screenshot 2023-05-12 at 10 03 37 am

aekiss commented 1 year ago

Is it reasonable to just turnoff runoff altogether as a quick test to see if this is really what is causing the problem?

yeah I guess we could try that. Or try JRA55-do v1.4 runoff. Or try 1deg_era5_iaf with JRA55-do v1.5 runoff. But did the runoff change between these versions of JRA55-do?

This is not the only difference between the 1° and 0.25° configs, as the higher resolution means that runoff is more concentrated and CFL values are different etc etc

aekiss commented 1 year ago

There's no mention of runoff differences between JRA55-do v1.4 and v1.5 here https://climate.mri-jma.go.jp/pub/ocean/JRA55-do/ or here https://climate.mri-jma.go.jp/pub/ocean/JRA55-do/docs/v1_5-manual/User_manual_jra55_do_v1_5.pdf

aekiss commented 1 year ago

There's checkerboarding suggestive of an overly-long barotropic timestep in archive/restart005/ocean/ocean_barotropic.res.nc.0015 (with ice_ocean_timestep = 300 and barotropic_split = 80). But since it still crashes with a much shorter timestep this is probably a red herring. Screenshot 2023-05-12 at 10 37 36 am

aekiss commented 1 year ago

The Free surface penetrating rock error is triggered here. https://github.com/mom-ocean/MOM5/blob/9b8ec93/src/mom5/ocean_core/ocean_thickness.F90#L3380-L3398

It's weird that we don't also get an Error from ocean_thickness: Surface undulations too negative; model unstable message specifying the location of the offending grid point - that would be helpful information to have.

rmholmes commented 1 year ago

I now know where this is coming from. There is a massive spike in the meridional wind from the ERA-5 data just south of Papua New Guinea that starts on 1984-08-11T15:00:00 and lasts until 1984-08-11T21:00:00 (in the file /g/data/rt52/era5/single-levels/reanalysis/10v/1984/10v_era5_oper_sfc_19840801-19840831.nc). Up to about 130ms-1. It's obvious in the following image causing some clear radiating waves in the wind field.

I guess the 1-degree model survived it, but the 1/4-degree can't.

This must be an upstream ERA-5 problem. @aekiss any suggestions here? Maybe using the scaling system to scale down the winds for this 6 hour period? Do you have an example you can point me at to do this?

aekiss commented 1 year ago

Ah, well spotted @rmholmes! Looks like a bad observation slipped through the quality control and was incorporated into the reanalysis.

Yes, scaling would be the way to go - here's what I did last time we had a problem like this - I scaled down the winds using a spatiotemporal Gaussian to ensure smoothness in space and time: https://github.com/COSIMA/access-om2/wiki/Tutorials#Scaling-the-forcing-fields https://github.com/aekiss/notebooks/blob/master/make-jra55-scaling.ipynb

In your case the other fields (not just wind) might be worth checking/fixing, as they will be dynamically linked to the bad wind via the reanalysis.

rmholmes commented 1 year ago

Thanks @aekiss. I'll give it a burn.

Weird, it doesn't appear in u10:

rmholmes commented 1 year ago

This fix worked fine so closing this issue (my scaling file is at https://github.com/rmholmes/cosima-scripts/blob/master/ERA-5/025deg_era5_iaf_v10_blowup_scaling.ipynb). I'll continue running this hopefully up until near real time, and will report back.

rmholmes commented 1 year ago

I found another bad point, this time 1986-07-24T09:00:00, near the Ross Sea. It causes another blow-up:

I'll do the same thing.

aekiss commented 1 year ago

Wow, that's a doozy! Looks like they should tweak the QC in data ingestion...

rmholmes commented 1 year ago

Yeah. This one's funny because it actually completely disappears in the hour following the one I'm showing there.

Now I know what to look for hopefully I can get it through to the end of the cycle!

aekiss commented 1 year ago

Thanks, it's really valuable having these landmines mapped out before we run at higher resolution or with ACCESS-OM3.

rmholmes commented 1 year ago

Found a third one, 13-Nov-1992T19:00:00 in a similar location to the first one:

aekiss commented 1 year ago

Thanks @rmholmes. The points you're finding correspond to the known issues in table 2 here https://confluence.ecmwf.int/display/CKB/ERA5%3A+large+10m+winds

There are lots of spurious values, but evidently most aren't bad enough to trip up the model. The worst ones are up to 300m/s (Mach 0.87)!

A few times per year, the analysed low level winds, eg the 10m winds, become unrealistically large in a particular location, which varies amongst a few apparently preferred locations. The largest values seen so far are about 300 ms-1. This problem occurs towards the end of the data assimilation windows (9-21 UTC and 21-9 UTC) because of an instability in the analysis method.

I guess that explains why the Ross Sea doozy of 1986-07-24T09:00:00 suddenly vanishes (new assimilation cycle)?

From 19 February 2020 onwards, the ERA5 system has examined the 10m wind components and if the magnitude of either component exceeds 50 ms-1, then the analysed parameters are replaced with the "4v" parameters.

Hopefully the 50m/s cutoff post- 19 February 2020 will help with model stability. A cutoff seems an odd approach though, especially if applied to only one component and one grid point - it will mess up convergence and curl. The method we're using tries to minimise this problem by scaling both components by a factor that is smooth in space and time. That seems reasonable for toning down an otherwise-reasonable storm, but maybe that's worse in the case of an isolated bad point because also scales data that is mostly ok (other than gravity waves).

rmholmes commented 1 year ago

Thanks for finding that @aekiss. Makes sense that someone has found these before. I'll continue on as I'm doing, checking these tables if I run into any other problems.

For now, this seems a better approach than replacing with the "4v" parameters?

I am continually impressed by what the model can cope with without blowing up.

COSIMA / access-om2

Bad ERA-5 wind forcing data at several times/locations causes 0.25° crash #274