COSIMA / mom6-eac

MOM6-SIS2 model configuration for the East Australian Current region.
0 stars 1 forks source link

Create open boundary conditions and initial config. #1

Open AndyHoggANU opened 2 years ago

AndyHoggANU commented 2 years ago

Use same "cookie-cutter" strategy as mom6-panan to create open boundary conditions from a single RYF year of ACCESS-OM2.

For topography, we should begin with ACCESS-OM2-01 topography, before pushing to 1/20°using etopo data.

angus-g commented 2 years ago

Thanks for getting this started! Do you have any suggestions for the forcing dataset? I guess we can start with daily boundary forcing like we have already, is there a year of daily T, S, U, V (and maybe eta)?

AndyHoggANU commented 2 years ago

I think for the first iteration we should do exactly as we did for mom6-panan. That is, the surface forcing is JRA-55-RYF9091 and the boundary forcing is taken from year ~186 of the ACCESS-OM2-01 spinup. Do you still have the old scripts for which dataset you derived the boundary forcing for those runs? If not, ping me and I will figure out a good run to use. If we don't have one, I can run for another year with daily output.

angus-g commented 2 years ago

I do have the scripts (they're in https://github.com/COSIMA/mom6-panan), but I thought there was something about the input data disappearing — but it's just that the non-bit-groomed output isn't available. I don't think that's an issue, so I can go ahead and cut out the forcing.

ChrisC28 commented 2 years ago

Hi all, Thanks for getting this started. As a first step, RYF (both surface and boundary) is fine. I'm completely agonstic about the year to use. However, I'd like to play around with the conditions myself at some point. I'm assuming the scripts in https://github.com/COSIMA/mom6-panan/scripts are where I can see how this is done?

angus-g commented 2 years ago

That's right. I've got a bit of a description of the process in this document too: MOM6 Panantarctic.md

ChrisC28 commented 2 years ago

Do you make use of the Raphael Dussin's "brushcutter" stuff? I ask only because he has a number of example, although nothing seems to have been updated since 2017.

angus-g commented 2 years ago

I originally tried using it, but it wasn't suitable for our input data, and I think maybe the interpolation didn't work for scaling up to 1/20°

ChrisC28 commented 2 years ago

Hi all, Has there been any progress? Anything I can do to help out at this stage?

AndyHoggANU commented 2 years ago

We discussed this last week - I think the consensus was that @angus-g would shortly get into his code and produce some open BCs for us ... but I don't think we have moved too far on it yet. Angus, let us know ...

angus-g commented 2 years ago

This has taken me a little longer than anticipated: my previous scripts didn't scale up too well to 4 boundaries, and there was a lot of manual data massaging for panan that I didn't reincorporate to the scripts. That's all fixed now, and I have a set of boundaries and a configuration generated, which I'll commit soon. At the moment I'm running into a too-high SSH (at 294 locations):

WARNING from PE    52: Extreme surface sfc_state detected: i=  74 j= 306 lon=-208.650 lat= -23.7  
40 x=-208.650 y= -23.740 D= 1.0433E+01 SSH= 2.0153E+01 SST= 2.4220E+01 SSS= 9.8536E-03 U-= 0.000  
0E+00 U+= 7.1118E-01 V-= 0.0000E+00 V+= 3.7707E-02

They seem to be located on the coast, but I can't see any inlets or anything that were causing problems for us before.

ChrisC28 commented 2 years ago

Thanks for getting this running. Seems odd that the dodgy SSH values are on the coast given the large-ish distance to the boundaries. I can't think of a reason straight away why that would be the case in a regional model and not in a global model.

ChrisC28 commented 2 years ago

Russ and I just had a quick chat about this issue. He thinks that the blow-up is likely occuring in the first couple of time-steps at a single location then propagating along the coast (a "nonsense Kelvin wave"). Any chance you could run the model forward for 20 time steps, producing SSH at every time step? Could be a way to identify where the problem is first occuring.

angus-g commented 2 years ago

You two and Andy are on the same wavelength! Here's the first pass at the SSH, although it seems to only be hourly even though I've asked for it every timestep (currently DT = 300). Pretty clearly points to the region where the extreme values are being detected!

zos_series

ChrisC28 commented 2 years ago

Where are you running these (ie. path to output)?

angus-g commented 2 years ago

Outputs currently at /scratch/x77/ahg157/mom6/work/eac-01/

AndyHoggANU commented 2 years ago

you might need to be on x77 to see it. If yes, just submit the request.

ChrisC28 commented 2 years ago

Request sent

AndyHoggANU commented 2 years ago

Added both of you

aekiss commented 2 years ago

doesn't look much like a Kelvin wave to me...

ChrisC28 commented 2 years ago

Ringing

ChrisC28 commented 2 years ago

Looks kinda like ringing (3rd time step)

ChrisC28 commented 2 years ago

If you've ever wondered what the normal modes of the system look like....

russfiedler commented 2 years ago

@angus-g

Ok, looking at your ICs I see that you have values of -4.675 on land for your AVE_SSH and zero elsewhere. Make sure this hasn't snuck onto the wet points by mistake. Also I see you're starting from rest. I agree with Chis that this just looks like ringing at the start of the run and the time step needs to be lowered untill you establish a reasonable SSH and velocity fields. I've seen reversals of the Leeuwin and all other strange things due to the SSH heave at the start of runs.

ChrisC28 commented 2 years ago

I was under the impression that the ICs were taken directly from the global model? Ie. shouldn't be from rest but close to balanced (at least locally)

AndyHoggANU commented 2 years ago

Maybe that is the problem -- we might need initial \eta, as well as T and S, (can keep u,v=0) to start up the model? My impression is that the initial state only has T and S for now??

angus-g commented 2 years ago

That's right, usually I've been able to spin up with just T and S without issue. I've added initial η, but the same blow-up occurs with or without initial velocities! The SSH magnitude that develops even in the first timestep swamps that from the initial state anyway.

ChrisC28 commented 2 years ago

Looks like the baroclinic timestep is down to 50s but the tracer/thermodynamic timestep is 300s. Could that be an issue?

angus-g commented 2 years ago

DT_THERM is commented out there: they're both 50s (MOM_parameter_doc.all is probably the best reference for the parameters the model actually used)

ChrisC28 commented 2 years ago

SSH_blow_up SSH increase is linear in time just off the QLD coast

ChrisC28 commented 2 years ago

In fact, it seems like most of the domain shows some linear growth in the early part of the run (exception are points near the open boundaries). Eventually, the growth saturates/asymptotes, but only after eta reaches ~O(several metres). Here's an example from mid-way between Tas and NZ. Seems to show linear growth for the first hour or so.

Linear_growth_ex

ChrisC28 commented 2 years ago

The effect is reduced in the south of the domain when compared with the north, but is still present. At first glance, it doesn't appear to be "coming from" anywhere. Just a steady, linear-ish increase in eta across the whole domain (save points near the open boundaries).

ChrisC28 commented 2 years ago

Compare the following. Snapshot from timestep=1 and timestep=50 (ie after 2500 seconds, ~40minutes). Colourscale limits are -5 to 5m.

ChrisC28 commented 2 years ago

MOM6_EAC_ts_1 MOM__EAC_ts_50

angus-g commented 2 years ago

I think this should be good now, with https://github.com/COSIMA/mom6-eac/commit/5eab9bcd06ab58944b3e87e667b1d22fbb0a8096

ChrisC28 commented 2 years ago

Excellent. For the initiated, what has been changed? It's a little tricky to read the config files and to understand why the lack of a run-off field might cause blow up.

angus-g commented 2 years ago

@ChrisC28 summarised here: https://github.com/COSIMA/mom6-eac/issues/2#issuecomment-1076943640

I should note that I haven't run beyond the initial few days to make sure we're not hitting the same issue, time to put a proper run on the queue and see what goes wrong next!

AndyHoggANU commented 2 years ago

OK, so. Next question is -- who wants to run this? @ChrisC28 :: I am happy to have a crack and run for a couple of years. This might be useful as I've just been through this exercise with panant, so I might be able to spot errors more easily. But also happy for you to do the first proper run and keep us up to date. Let me know what suits best.

ChrisC28 commented 2 years ago

For reference, the only river in eastern Australia with a signifcant run-off is the Burdekin in central north queensland, and then only in the Moonsoon period. So, this is clearly a numerical issue, no?

russfiedler commented 2 years ago

@ChrisC28 for the runoff the last entry in the data_table can serve two functions. If the field and file exists it acts as a scaling factor. If it doesn't exist it acts as a constant field replacement variable (pretty common for mslp or if running an idealised model). In this case the file didn't exist and the values was 1 which is 1 kg m-2 s-1 or 1e-3 m s-1 or... wait for it 86 m day-1 which matches your early linear growth estimate. Pretty impressive that the model ran for so long!

ChrisC28 commented 2 years ago

I'd be happy for @AndyHoggANU to run for a few years just to catch anything seriously problematic going on. However, I'd like to port the model over the Pawsey system in the near future. For that, I might need some help with compiling and running (although I don't see too many problems, as I run MOM5 as part of the CSIRO coupled model on Pawsey with no issues).

Pawsey is Cray architecture and uses slurm as a workload manager. This (obviously) breaks the payu connection, which is sub-optimal but not a huge deal.

ChrisC28 commented 2 years ago

Thanks @russfiedler that makes perfect sense.

ChrisC28 commented 2 years ago

It probably makes sense to run a few years on a well known system (ie. Gadi) before porting to Pawsey.

AndyHoggANU commented 2 years ago

I will have a crack at it and let you know how it goes.

aidanheerdegen commented 2 years ago

Pawsey is Cray architecture and uses slurm as a workload manager. This (obviously) breaks the payu connection, which is sub-optimal but not a huge deal.

There was some enthusiasm for porting to other systems/schedulers see

https://github.com/payu-org/payu/issues/182

https://github.com/payu-org/payu/issues/258

but there was no compelling use case.

This could be the compelling use case.

I know @marshallward did get it running at GFDL using slurm.

ChrisC28 commented 2 years ago

There's a bit of CSIRO climate research happening on Pawsey at the moment. Richard Mataer and Dougie Squire are both running models on it at present, as well as some of the group that run in the unstructured mesh-model based on MPAS. The current HPC machine (magnus) is currently being slowly decommisioned in favour of a new HPC machine Setonix: https://pawsey.org.au/about-us/capital-refresh/ We have it on good authority that it will have similar specs to Gadi.

porting to Pawsey/setonix could be really useful in the long term and I'd be happy to support this MOM6_EAC project being a guinea pig. Let me know what/if you need anything to support that.

marshallward commented 2 years ago

I was able to get Payu working on GFDL's machine which is also a Cray environment with slurm, though if I recall there was some very kludgy stuff.

But we did abstract out a lot of PBS-specific stuff and I think with a little more work we could get it running on other Slurm-based machines.

I would be up for working on this again. But we can speak more of this over at the payu repo.

ChrisC28 commented 2 years ago

I've opened an issue over on the Payu repo #https://github.com/payu-org/payu/issues/323

AndyHoggANU commented 2 years ago

Here is a snapshot of surface speed over the first 6 months. Just to say -- we have problems at the corners. Will stop for now, but let's pick this up next week!

https://user-images.githubusercontent.com/26753100/160056130-9a4d5081-49a5-43ce-bd5b-19722c8d22b8.mov

ChrisC28 commented 2 years ago

Doesn't look too bad if you ignor the corners!

AndyHoggANU commented 2 years ago

In addition to the corner problem seen above, I did see one more issue at the very end of year 1:

FATAL from PE    59: time_interp_external 2: time 726714 (19911231.001500 is after range of list 726350-726714(19910101.000000 - 19911231.000000),file=INPUT/forcing/forcing_obc_segment_003.nc,field=u_segment_003

Does this mean that it is missing the last day of the boundary forcing file?