Open AndyHoggANU opened 2 years ago
Thanks for getting this started! Do you have any suggestions for the forcing dataset? I guess we can start with daily boundary forcing like we have already, is there a year of daily T, S, U, V (and maybe eta)?
I think for the first iteration we should do exactly as we did for mom6-panan. That is, the surface forcing is JRA-55-RYF9091 and the boundary forcing is taken from year ~186 of the ACCESS-OM2-01 spinup. Do you still have the old scripts for which dataset you derived the boundary forcing for those runs? If not, ping me and I will figure out a good run to use. If we don't have one, I can run for another year with daily output.
I do have the scripts (they're in https://github.com/COSIMA/mom6-panan), but I thought there was something about the input data disappearing — but it's just that the non-bit-groomed output isn't available. I don't think that's an issue, so I can go ahead and cut out the forcing.
Hi all, Thanks for getting this started. As a first step, RYF (both surface and boundary) is fine. I'm completely agonstic about the year to use. However, I'd like to play around with the conditions myself at some point. I'm assuming the scripts in https://github.com/COSIMA/mom6-panan/scripts are where I can see how this is done?
That's right. I've got a bit of a description of the process in this document too: MOM6 Panantarctic.md
Do you make use of the Raphael Dussin's "brushcutter" stuff? I ask only because he has a number of example, although nothing seems to have been updated since 2017.
I originally tried using it, but it wasn't suitable for our input data, and I think maybe the interpolation didn't work for scaling up to 1/20°
Hi all, Has there been any progress? Anything I can do to help out at this stage?
We discussed this last week - I think the consensus was that @angus-g would shortly get into his code and produce some open BCs for us ... but I don't think we have moved too far on it yet. Angus, let us know ...
This has taken me a little longer than anticipated: my previous scripts didn't scale up too well to 4 boundaries, and there was a lot of manual data massaging for panan that I didn't reincorporate to the scripts. That's all fixed now, and I have a set of boundaries and a configuration generated, which I'll commit soon. At the moment I'm running into a too-high SSH (at 294 locations):
WARNING from PE 52: Extreme surface sfc_state detected: i= 74 j= 306 lon=-208.650 lat= -23.7
40 x=-208.650 y= -23.740 D= 1.0433E+01 SSH= 2.0153E+01 SST= 2.4220E+01 SSS= 9.8536E-03 U-= 0.000
0E+00 U+= 7.1118E-01 V-= 0.0000E+00 V+= 3.7707E-02
They seem to be located on the coast, but I can't see any inlets or anything that were causing problems for us before.
Thanks for getting this running. Seems odd that the dodgy SSH values are on the coast given the large-ish distance to the boundaries. I can't think of a reason straight away why that would be the case in a regional model and not in a global model.
Russ and I just had a quick chat about this issue. He thinks that the blow-up is likely occuring in the first couple of time-steps at a single location then propagating along the coast (a "nonsense Kelvin wave"). Any chance you could run the model forward for 20 time steps, producing SSH at every time step? Could be a way to identify where the problem is first occuring.
You two and Andy are on the same wavelength! Here's the first pass at the SSH, although it seems to only be hourly even though I've asked for it every timestep (currently DT = 300). Pretty clearly points to the region where the extreme values are being detected!
Where are you running these (ie. path to output)?
Outputs currently at /scratch/x77/ahg157/mom6/work/eac-01/
you might need to be on x77 to see it. If yes, just submit the request.
Request sent
Added both of you
doesn't look much like a Kelvin wave to me...
Looks kinda like ringing (3rd time step)
If you've ever wondered what the normal modes of the system look like....
@angus-g
Ok, looking at your ICs I see that you have values of -4.675
on land for your AVE_SSH and zero elsewhere. Make sure this hasn't snuck onto the wet points by mistake. Also I see you're starting from rest. I agree with Chis that this just looks like ringing at the start of the run and the time step needs to be lowered untill you establish a reasonable SSH and velocity fields. I've seen reversals of the Leeuwin and all other strange things due to the SSH heave at the start of runs.
I was under the impression that the ICs were taken directly from the global model? Ie. shouldn't be from rest but close to balanced (at least locally)
Maybe that is the problem -- we might need initial \eta, as well as T and S, (can keep u,v=0) to start up the model? My impression is that the initial state only has T and S for now??
That's right, usually I've been able to spin up with just T and S without issue. I've added initial η, but the same blow-up occurs with or without initial velocities! The SSH magnitude that develops even in the first timestep swamps that from the initial state anyway.
Looks like the baroclinic timestep is down to 50s but the tracer/thermodynamic timestep is 300s. Could that be an issue?
DT_THERM
is commented out there: they're both 50s (MOM_parameter_doc.all
is probably the best reference for the parameters the model actually used)
SSH increase is linear in time just off the QLD coast
In fact, it seems like most of the domain shows some linear growth in the early part of the run (exception are points near the open boundaries). Eventually, the growth saturates/asymptotes, but only after eta reaches ~O(several metres). Here's an example from mid-way between Tas and NZ. Seems to show linear growth for the first hour or so.
The effect is reduced in the south of the domain when compared with the north, but is still present. At first glance, it doesn't appear to be "coming from" anywhere. Just a steady, linear-ish increase in eta across the whole domain (save points near the open boundaries).
Compare the following. Snapshot from timestep=1 and timestep=50 (ie after 2500 seconds, ~40minutes). Colourscale limits are -5 to 5m.
I think this should be good now, with https://github.com/COSIMA/mom6-eac/commit/5eab9bcd06ab58944b3e87e667b1d22fbb0a8096
Excellent. For the initiated, what has been changed? It's a little tricky to read the config files and to understand why the lack of a run-off field might cause blow up.
@ChrisC28 summarised here: https://github.com/COSIMA/mom6-eac/issues/2#issuecomment-1076943640
I should note that I haven't run beyond the initial few days to make sure we're not hitting the same issue, time to put a proper run on the queue and see what goes wrong next!
OK, so. Next question is -- who wants to run this? @ChrisC28 :: I am happy to have a crack and run for a couple of years. This might be useful as I've just been through this exercise with panant, so I might be able to spot errors more easily. But also happy for you to do the first proper run and keep us up to date. Let me know what suits best.
For reference, the only river in eastern Australia with a signifcant run-off is the Burdekin in central north queensland, and then only in the Moonsoon period. So, this is clearly a numerical issue, no?
@ChrisC28 for the runoff the last entry in the data_table
can serve two functions. If the field and file exists it acts as a scaling factor. If it doesn't exist it acts as a constant field replacement variable (pretty common for mslp or if running an idealised model). In this case the file didn't exist and the values was 1
which is 1 kg m-2 s-1
or 1e-3 m s-1
or... wait for it 86 m day-1
which matches your early linear growth estimate. Pretty impressive that the model ran for so long!
I'd be happy for @AndyHoggANU to run for a few years just to catch anything seriously problematic going on. However, I'd like to port the model over the Pawsey system in the near future. For that, I might need some help with compiling and running (although I don't see too many problems, as I run MOM5 as part of the CSIRO coupled model on Pawsey with no issues).
Pawsey is Cray architecture and uses slurm as a workload manager. This (obviously) breaks the payu connection, which is sub-optimal but not a huge deal.
Thanks @russfiedler that makes perfect sense.
It probably makes sense to run a few years on a well known system (ie. Gadi) before porting to Pawsey.
I will have a crack at it and let you know how it goes.
Pawsey is Cray architecture and uses slurm as a workload manager. This (obviously) breaks the payu connection, which is sub-optimal but not a huge deal.
There was some enthusiasm for porting to other systems/schedulers see
https://github.com/payu-org/payu/issues/182
https://github.com/payu-org/payu/issues/258
but there was no compelling use case.
This could be the compelling use case.
I know @marshallward did get it running at GFDL using slurm
.
There's a bit of CSIRO climate research happening on Pawsey at the moment. Richard Mataer and Dougie Squire are both running models on it at present, as well as some of the group that run in the unstructured mesh-model based on MPAS. The current HPC machine (magnus) is currently being slowly decommisioned in favour of a new HPC machine Setonix: https://pawsey.org.au/about-us/capital-refresh/ We have it on good authority that it will have similar specs to Gadi.
porting to Pawsey/setonix could be really useful in the long term and I'd be happy to support this MOM6_EAC project being a guinea pig. Let me know what/if you need anything to support that.
I was able to get Payu working on GFDL's machine which is also a Cray environment with slurm, though if I recall there was some very kludgy stuff.
But we did abstract out a lot of PBS-specific stuff and I think with a little more work we could get it running on other Slurm-based machines.
I would be up for working on this again. But we can speak more of this over at the payu repo.
I've opened an issue over on the Payu repo #https://github.com/payu-org/payu/issues/323
Here is a snapshot of surface speed over the first 6 months. Just to say -- we have problems at the corners. Will stop for now, but let's pick this up next week!
Doesn't look too bad if you ignor the corners!
In addition to the corner problem seen above, I did see one more issue at the very end of year 1:
FATAL from PE 59: time_interp_external 2: time 726714 (19911231.001500 is after range of list 726350-726714(19910101.000000 - 19911231.000000),file=INPUT/forcing/forcing_obc_segment_003.nc,field=u_segment_003
Does this mean that it is missing the last day of the boundary forcing file?
Use same "cookie-cutter" strategy as
mom6-panan
to create open boundary conditions from a single RYF year of ACCESS-OM2.For topography, we should begin with
ACCESS-OM2-01
topography, before pushing to 1/20°using etopo data.