NOAA-EMC / RDASApp

Regional DAS
GNU Lesser General Public License v2.1
2 stars 15 forks source link

create conus_3km and NA_3km MPAS domains for testing some MPASJEDI functions #131

Open guoqing-noaa opened 3 months ago

guoqing-noaa commented 3 months ago

@ShunLiu-NOAA @TingLei-NOAA @SamuelDegelia-NOAA

Do we plan to run LETKF/GLETKF on these two domains?

ShunLiu-NOAA commented 3 months ago

@guoqing-noaa If there is a test domain available, it will help us to find the potential MPAS-JEDI issues at the early stage. But please don't slow down the development of cycled MPAS-JEDI DA.

Junjun-NOAA commented 3 months ago

I have a 3km domain available. If you think it is big enough to test, we can use it.

Screenshot 2024-08-28 at 5 45 34 PM
chunhuazhou commented 3 months ago

For these new domains, do we want to use the new MPAS code (8.2.1) and use mpasout as the model output and DA backgound?

guoqing-noaa commented 3 months ago

For these new domains, do we want to use the new MPAS code (8.2.1) and use mpasout as the model output and DA backgound?

Good question. Let's keep this for our next step. For now, let's repeat the previous steps using the previous MPAS code. Thanks!

TingLei-NOAA commented 3 months ago

@Junjun-NOAA That domain seems good to me as a conus domain run. What are your opinions @ShunLiu-NOAA @guoqing-noaa

ShunLiu-NOAA commented 3 months ago

@TingLei-NOAA for sanity check of the performance of high resolution JEDI analysis, it is a good way to try. For cycled DA test, we may start with low resolution test.

guoqing-noaa commented 3 months ago

@TingLei-NOAA Good with me as well. @Junjun-NOAA 's case uses init.nc for ensembles, so it will not be able to test letkf.

Jake told me that he had successfully tested mpasjedi for 10+ million cells recently, so it looks like we may run NA_3km mpasjedi cases now. @chunhuazhou is working on generating an NA_3km case.

TingLei-NOAA commented 3 months ago

@ShunLiu-NOAA @guoqing-noaa Thanks. So, this 3km conus case would be good for my one cycle variational test of mgbf . @Junjun-NOAA would you please point me to your case and I can run it on hera? Thanks.

Junjun-NOAA commented 3 months ago

@TingLei-NOAA Here is my run directory on Hera : /scratch1/BMC/wrfruc/jjhu/rundir/RDASApp/expr/mpas_2024052700_3km Please let me know if you have any questions or comments. Thanks

TingLei-NOAA commented 3 months ago

@Junjun-NOAA Thank you so much. I will keep you updated on how things are going.

guoqing-noaa commented 3 months ago

@chunhuazhou Could you update your progress here? General information, the error information will help. Thanks!

chunhuazhou commented 3 months ago

As discussed at the RRFS developers' meeting, I am providing some updates here on my attempt to create a NA 3km domain. Step 1: using create_region to generate na3km.grid.nc by clipping from the global 3km grids. This can be done using bigmem partition in a slurm job and can take a few hours. Step 2: run init_atmosphere_model to generate na3km.static.nc This requires a lot of memory for the NA 3km. I was trying to use 60 nodes on kjet and still got OOM failures: "slurmstepd: error: Detected 1 oom_kill event in StepId=9418345.0. Some of the step tasks have been OOM Killed."

TingLei-NOAA commented 3 months ago

@chunhuazhou Thanks for sharing! Seems now the problem is with the processing tool from MPAS first. Right? Which machine are you using ? Will simple increasing nodes number work?

chunhuazhou commented 3 months ago

@TingLei-NOAA The problem is that with so many cells for the NA3km domain, it requires a lot of memory to run MPAS. My step 2 is only running init_atmohsphere_model, which requires much less resources than the MPAS model forecast itself. I am testing it on jet. I will try to increase the number of nodes and see how many nodes can work for the NA3km.

guoqing-noaa commented 3 months ago

@chunhuazhou Could you try it on Gaea? It has much more memory.

chunhuazhou commented 3 months ago

@guoqing-noaa I haven't tried MPAS on Gaea yet. Have you tried it? Do we have all the required modules there?

guoqing-noaa commented 3 months ago

@chunhuazhou Let me create one module file for you.

guoqing-noaa commented 3 months ago

@chunhuazhou Use the following command to load modules need by the MPAS model: source /gpfs/f5/ufs-ard/world-shared/gge/c5_intel.sh
I succesfully compiled mpas.

chunhuazhou commented 3 months ago

@guoqing-noaa Thanks so much for the modules! Do you happen to have the WPS_GEOG files on Gaea? I am about to download it from NCAR website but if you already have them or if you know where I can find them, I can skip the download. Thanks!

guoqing-noaa commented 3 months ago

@guoqing-noaa Thanks so much for the modules! Do you happen to have the WPS_GEOG files on Gaea? I am about to download it from NCAR website but if you already have them or if you know where I can find them, I can skip the download. Thanks!

Does it take a very long time? If yes, I can transfer a copy for you.

chunhuazhou commented 3 months ago

@guoqing-noaa MPAS compiled successfully. Thanks for the modules! Downloading WPS_GEOG files should not take too long, I think. Thanks!

chunhuazhou commented 2 months ago

Adding some updates here, after trying both Jet and Gaea, I found out the issue couldn't be fixed by increasing resources. Instead, adding one namelist entry config_gwd_cell_scaling=1.1 to &preproc_stages fixed the failure of running init_atmosphere_model to generate na3km.static.nc. The namelist option config_gwd_cell_scaling is the scaling factor for the effective grid cell diameter used in computation of GWD static fields (default value is 1.0). Here is the NA3km domain that is working now:

image

Associated numbers for this domain are as follows:

dimensions:
    nCells = 10591561 ;
    nVertices = 21195314 ;
    nEdges = 31786874 ;
na3km.custom.pts 
Name: na3km
Type: custom
Point: 54.0, -106.0
75.0, 150.0
75.0, -30.0
5.0, -60
5.0, 180.0

I have the mpasout file for the deterministic run but the generation of the ensemble mpasout files will take more time - I will update once they are ready. Please let me know what you think - do we need to increase the domain size? I do have another task of running create_region for a larger domain and can run mpas if needed. Thanks!

guoqing-noaa commented 2 months ago

Great progress! Thanks, @chunhuazhou!

When you generate ensembles, could you use mem001 instead of mem01? My PR #161 has used mem001.

chunhuazhou commented 2 months ago

@guoqing-noaa I will! Thanks!

TingLei-NOAA commented 2 months ago

@chunhuazhou Great to know this moves forward. Thanks!

hu5970 commented 2 months ago

From what I see in this issue, I think CONUS 3km grid is enough for our current development and test. I will say we current focus on CONUS 3km grid only. Please let me know if I missed anything here on NA3km tests.

chunhuazhou commented 2 months ago

@hu5970 Thanks Ming for your input! I want to add here that for NA3km tests, I have been having issues moving the model forecast forward, with the model producing unreasonably huge wind near the SE lateral boundaries and then blowing out very quickly.

guoqing-noaa commented 2 months ago

@chunhuazhou Thanks for your effort on this.

My suggestions:

  1. Prepare and document this crash case and send it to the NCAR (or ready to be sent to NCAR).
  2. We don't need to test LETKF at this time, we only want to see whether current MPASJEDI can handle 10+ millions of cells. So we don't need to do forecasts. We can just use init.nc files for this purpose.
chunhuazhou commented 2 months ago

@guoqing-noaa Regarding your suggestions: I already posted a thread at MPAS forum at https://forum.mmm.ucar.edu/threads/mpas-model-forecast-stopped-right-after-hour-0-segmentation-fault.19174/#post-46491 I do have one mpasout files at 23Z 05/26/2024 at /lfs5/BMC/wrfruc/Chunhua.Zhou/nco/stmp/na3km/1.0.1/rrfs.20240526/23/fcst/mpasout.2024-05-26_23.00.00.nc if anybody wants to try it out.

guoqing-noaa commented 2 months ago

Thanks, @chunhuazhou !

hu5970 commented 2 months ago

It is good for try the big NA 3km domain. But let's focus on more realistic CONUS 3km for initial test and evaluations.