Closed guillaumevernieres closed 2 years ago
OK. Let me know if you have a starting point in mind. Also is the MOM6-CICE coupling intended to be tw0 way or just MOM6-> CICE6? - Keston
OK. Let me know if you have a starting point in mind. Also is the MOM6-CICE coupling intended to be tw0 way or just MOM6-> CICE6? - Keston
@kestonsmith-noaa There was an attempt a few month ago, but it was abandoned. We should start from where they left off.
OK sounds good- is there a git hub repository or link to that work?
On Mon, Jul 11, 2022 at 8:30 AM Guillaume Vernieres < @.***> wrote:
OK. Let me know if you have a starting point in mind. Also is the MOM6-CICE coupling intended to be tw0 way or just MOM6-> CICE6? - Keston
@kestonsmith-noaa https://github.com/kestonsmith-noaa There was an attempt a few month ago, but it was abandoned. We should start from where they left off.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/godas/issues/337#issuecomment-1180350345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUY35GRVCJMCHWWAFL2AVLVTQHT5ANCNFSM523LLN5A . You are receiving this because you were mentioned.Message ID: @.***>
-- Keston Smith Support Scientist IMSG at NWS/NCEP/Environmental Modeling Center National Oceanic and Atmospheric Administration (774) 766-1545
OK sounds good- is there a git hub repository or link to that work? … On Mon, Jul 11, 2022 at 8:30 AM Guillaume Vernieres < @.> wrote: OK. Let me know if you have a starting point in mind. Also is the MOM6-CICE coupling intended to be tw0 way or just MOM6-> CICE6? - Keston @kestonsmith-noaa https://github.com/kestonsmith-noaa There was an attempt a few month ago, but it was abandoned. We should start from where they left off. — Reply to this email directly, view it on GitHub <#337 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZUY35GRVCJMCHWWAFL2AVLVTQHT5ANCNFSM523LLN5A . You are receiving this because you were mentioned.Message ID: @.> -- Keston Smith Support Scientist IMSG at NWS/NCEP/Environmental Modeling Center National Oceanic and Atmospheric Administration (774) 766-1545
I think so, but I forgot where. I'll organize a meeting with the person(s) who worked on this issue.
Edited the description, but in case of, here's the issue link: https://github.com/ufs-community/ufs-weather-model/issues/289
I have a sandbox on orion here: /work/noaa/stmp/dworthen/stmp/dworthen/FV3_RT/cpld_c48/cpld_control_c48
I've added the following to the MOM_input:
! === module MOM ===
VERBOSITY = 9 ! default = 2
! Integer controlling level of messaging
! 0 = Only FATAL messages
! 2 = Only FATAL, WARNING, NOTE [default]
! 9 = All)
DEBUG = True ! [Boolean] default = False
! If true, write out verbose debugging data.
The error file (err) shows the following:
12: h-point: mean= 3.9560357017408208E+33 min= -1.8301177695709252E+00 max= 9.9692099683868690E+36 Post extract_sfc SST
12: h-point: c= 47514 Post extract_sfc SST
12: h-point: mean= 3.9560357017408208E+33 min= 0.0000000000000000E+00 max= 9.9692099683868690E+36 Post extract_sfc SSS
12: h-point: c= 36038 Post extract_sfc SSS
The model fails with
2: [Orion-01-42:453600:0:453600] Caught signal 8 (Floating point exception: floating-point overflow)
2: ==== backtrace (tid: 453600) ====
2: 0 0x0000000007efa676 nst_module_mp_cool_skin_() /work/noaa/marine/dworthen/ufs_c48/FV3/ccpp/physics/physics/module_nst_model.f90:863
which I believe is caused by the _FillValue (E+36) appearing in non-masked areas of the ocean. To use a MOM6 restart, change the bottom of the input.nml (ie, MOM_input_nml) to use input_filename = 'r'
and add the MOM6 restart to the INPUT directory.
The sandbox can be copied and run from another location using sbatch job_card
. Note the executable (fv3.exe) and required module files (modules.fv3) are sym linked to my current build directory. If rebuilding is required, let me know and I can explain how to do it from your own UWM checkout.
Thanks a lot for the sandbox, and I will checking it at the sandbox and let you know if the rebuilding is required.
I could reproduced the errors from my sandbox.
OK. Let me know if you have a starting point in mind. Also is the MOM6-CICE coupling intended to be tw0 way or just MOM6-> CICE6? - Keston
@kestonsmith-noaa There was an attempt a few month ago, but it was abandoned. We should start from where they left off.
@kestonsmith-noaa : I tend to read 1/2 of the wordls and extrapolate the intent of a post, bad bad me, sorry! Yes, it's a two-way coupling between all components.
I could reproduced the errors from my sandbox.
Very cool! Thanks @DeniseWorthen and @hyunchul386 .
I can't believe @hyunchul386 or @DeniseWorthen didn't click the boxes!!!! ... So I had to do it, sorry.
@guillaumevernieres That's what managers are for :-)
@hyunchul386 , are you planning to work on this today as well?
@Guillaume Vernieres - NOAA Federal @.***> Yes, the one day run for 2021-03-22 is running at my sandbox with a new ocean IC.
On Wed, Jul 13, 2022 at 9:35 AM Guillaume Vernieres < @.***> wrote:
@hyunchul386 https://github.com/hyunchul386 , are you planning to work on this today as well?
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/godas/issues/337#issuecomment-1183233066, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ73E4ME2ZHG5WT35Z5R7DVT3A2RANCNFSM523LLN5A . You are receiving this because you were mentioned.Message ID: @.***>
@guillaume Vernieres - NOAA Federal @.> Yes, the one day run for 2021-03-22 is running at my sandbox with a new ocean IC. … On Wed, Jul 13, 2022 at 9:35 AM Guillaume Vernieres < @.> wrote: @hyunchul386 https://github.com/hyunchul386 , are you planning to work on this today as well? — Reply to this email directly, view it on GitHub <#337 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ73E4ME2ZHG5WT35Z5R7DVT3A2RANCNFSM523LLN5A . You are receiving this because you were mentioned.Message ID: @.***>
@hyunchul386 That is totally crazy!!!!
@guillaumevernieres The one day run completed, which does not mean the correct (?) results, but the one day run finished. Issues may be whether the results are reasonable or not, and to re-tune the memory. Currently 20 tasks and wallclock time is more than 80 min for one day run.
@hyunchul386 The compilation was in debug mode; to test the timing you would need to recompile w/o debug=on. Do you want to try that?
@DeniseWorthen Thank you. I'll check it.
@hyunchul386 Do you want the instructions for re-compiling in debug mode, or do you know how to do that already?
@DeniseWorthen, I think rebuild the run, and would you let me know about this rebuild?
To compile the model you can do this:
git clone https://github.com/ufs-community/ufs-weather-model.git ufs-weather-model
cd ufs-weather-model
git submodule update --init --recursive
cd tests
To compile in debug mode:
./compile.sh orion.intel '-DAPP=S2S -DDEBUG=ON -DCCPP_SUITES=FV3_GFS_v17_coupled_p8' '' YES NO 2>&1 | tee compile.log
To compile in non-debug mode:
./compile.sh orion.intel '-DAPP=S2S -DCCPP_SUITES=FV3_GFS_v17_coupled_p8' '' YES NO 2>&1 | tee compile.log
The first "YES" means to build cleanly. You can change this to "NO" if you're making code changes and want to test a code change. It will then only rebuild what is needed.
The "NO" means to clean afterwards. Generally leave this as NO so that you don't need to rebuild from scratch each time.
@DeniseWorthen Thank you, I'll check it
... Doing my part and clicking one more box!
@hyunchul386 I forgot the final step. You'll have in the tests directory both fv3.exe and modules.fv3. You'll need to remove the sym-links to my build in the sandbox. Then either copy those two files from your build into the sandbox (or sym link them).
@DeniseWorthen Got it. Thanks a lot.
@hyunchul386 One more item. Turn off the verbosity and debug settings in the MOM_input.
@DeniseWorthen Okay Thanks By the way, the results of the debug run looks normal in my glance. Just FYI, attached is a quick and dirt checkout for the previous debug mode run from ferret
Just update for the non-debug mode run, as expected, the run time of non-debug run is drastically reduced from 6183 secs to 478 secs for one day run. @DeniseWorthen Thanks a lot.
Just update for the non-debug mode run, as expected, the run time of non-debug run is drastically reduced from 6183 secs to 478 secs for one day run. @DeniseWorthen Thanks a lot.
How many nodes are you using @hyunchul386 ?
@guillaumevernieres The run uses1node with 20 tasks.
I'm going to ask for a few more features that we will need for the DA @hyunchul386 : 1 - a diag/history collection for MOM6 and CICE on the native grid containing snapshots of the DA variables every hour. Maybe @kestonsmith-noaa or @DeniseWorthen can guide us on the CICE side. 2 - save intermittent restarts for all the components. It's just 1 flag, but it needs to be tested. 3 - split the restart into a MOM6 io_layout of 2,2 to simulate what we need to do for the 1/4 deg model
I'll add radio boxes in the description of course :)
I can help w/ the restart writing for the coupled system and the history writing for CICE. What are the needed DA variables for CICE?
It seems that the Ice DA variables are "hsnon, hicen, cicen".
By the "n" do you mean you want these variables by thickness category? Normally we write out the composit values (ie, added up over all thickness categories). So snow thickness by category, ice thickness by category and ice concentration by category?
That's correct @DeniseWorthen , we need the seaice var per categories. We do currently aggregate these variables, but this is probably going to change soon-ish.
Do we want the other CICE state variables as well i.e. sice001, sice002,...sice007,and qice00x?
Do we want the other CICE state variables as well i.e. sice001, sice002,...sice007,and qice00x?
Good point @kestonsmith-noaa , at this point, all we need from CICE are the intermittent restart then. No point writing a history file which would have 90% of the content of a restart.
So what you want is just hourly restart files.
Do you need the full instructions for restarting the coupled model, or do you already have that set up somehow?
Yes, would you give the instructions for restarting the coupled model?
So what you want is just hourly restart files.
Do you need the full instructions for restarting the coupled model, or do you already have that set up somehow?
There's 2 things here:
restart_n
?). I think it works OK in the coupled model, but to be tested and part of the reg test.soca-science
repo @hyunchul386 & @kestonsmith-noaa @guillaumevernieres So all the models can write checkpoint restarts; the components are controlled w/ restart_n
, but FV3 is controlled with restart_interval
in the model_configure
. There is this reference here that should help.
I can work on adding a c48/5deg control and restart test to the RTs if that is what you need.
I can work on adding a c48/5deg control and restart test to the RTs if that is what you need.
Yes, that is what we need @DeniseWorthen .
@hyunchul386 Could you provide me the location of the MOM6 restart you used?
@DeniseWorthen the location is /work/noaa/stmp/hlee/stmp/hlee/FV3_RT/cpld_c48/cpld_control_c48
@guillaumevernieres I'm setting up the control/restart tests now. I think there may be an issue writing the MOM6 restart on the first hour from a "cold start".
In this case, cold start means not having actual FV3 and CICE6 native ICs. For both these components, the model starts up using ICs from other sources. So they're not complete---for example, most ICs fields for CICE6 are filled w/ zero. Within the fast loop, FV3 calculates the missing CICE6 fluxes on the first time-step. For MOM6 though, we use a lagged startup. That means that MOM6 doesn't advance on the first coupling timestep. Instead, we advance two times on the second coupling timestep. At that point we advance normally.
Long story short, for the "cold start", MOM6 currently can't write a restart at the first hour, because the coupling timestep is also 1-hour and MOM6 doesn't advance until hour=2. MOM6 is currently failing when I try to write restarts on a one-hour interval.
@DeniseWorthen : I'm not too concerned about cold starting, is that a requirement for the RT ? If so, I would suggest to just make it work for the requirement of the RT and be done with it.
@DeniseWorthen Just FYI, my run give 3 hourly restart files and one hourly MOM6 diag files, /work/noaa/stmp/hlee/stmp/hlee/FV3_RT/cpld_c48/cpld_control_c48
I am not sure for the CICE diag/history files, because CICE diag would be differently controlled from MOM/FV3.
@hyunchul386 This is an issue w/ the restart file, not the diag file.
@guillaumevernieres The cold start is not a requirement for the RT per se, it is because we don't have the actual ICs for FV3 and CICE6 that we do the lagged startup. I need to fix MOM6 being able to write a restart (even if it doesn't advance) or we need to provide FV3 and CICE6 ICs---and our "control" run would then actually be a restart.
@hyunchul386 This is an issue w/ the restart file, not the diag file.
@guillaumevernieres The cold start is not a requirement for the RT per se, it is because we don't have the actual ICs for FV3 and CICE6 that we do the lagged startup. I need to fix MOM6 being able to write a restart (even if it doesn't advance) or we need to provide FV3 and CICE6 ICs---and our "control" run would then actually be a restart.
@DeniseWorthen , couldn't we do a short forecast offline, dump fv3/cice/mom6 restarts and use that to build the RT?
@guillaumevernieres What length of forecast would you want for the warmstart+restart tests?
The warmstart test will use a staged IC for FV3,CICE,MOM6 and CMEPS. Currently I've set it up to run a 6 hr forecast, and write restarts for FV3,MOM,CICE6 and CMEPS every hour. Does that set up work, or do you need something else (longer/shorter)?
The restart test will start from one of the warmstart test's checkpoint restarts. So, it could start from the first hour restarts, or the 5th hour restarts. It will run from whatever the restart hour is out to 6 hours.
The baseline will be compared between the warmstart and the warmstart+restart.
All components will also write final restarts.
6 hr forecast is perfect @DeniseWorthen .
Description
This issue is an attempt at finishing the work that @DeniseWorthen did a few years back (UFS issue #289). Here's what she can give us and what the issue is:
soca
but the model's initial conditions are based on T/S climatology (?).What needs to be done
sandbox
that contains all the resource files (namelists, restarts, ....) and make a read only copy on Orion and Hera