NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
74 stars 167 forks source link

MERRA2 aerosol options for UFS and the coupled model #200

Closed AnningCheng-NOAA closed 3 years ago

AnningCheng-NOAA commented 3 years ago

add 0.5 degree by 0.625 degree by 72 level ten-year MERRA2 aerosol climatological data as an option to replace the 5 degree by 5 degree OPAC aerosol data to drive radiation and microphysics. Initial tests have been performed using CCPP SCM. One year C768L127 free forecast run is being performed in DELL, HERA, and Orion, respectively.

AnningCheng-NOAA commented 3 years ago

new tests needed for changing CCPP=NO, interval = '24:00:00', export FHMAX_GFS_00=384, for DELL, Orion, HERA, and a few of more cases. Is cycled testing needed? is "Waves" needed to be on?

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA For the Merra2 changes please run 2 tests on each platform:

Test 1 - combine several settings into a single 2.5 cycle cycled test:

  1. interval=24 (gfs_cyc=1), make sure one of the full cycles in the test is 00z (suggest starting with 18z half cycle), feel free to set gfs_cyc=4 when running setup scripts though if there are no reasons to not run the gfs for 06z, 12z, or 18z
  2. FHMAX_GFS_00=384 (to make sure you don't hit walltime)
  3. DO_WAVE=YES
  4. RUN_CCPP=YES

That should invoke the parts of the system I need to see tested for this.

Test 2 - a 1.5 cycle cycled test on each platform with RUN_CCPP=NO to make sure adding support for Merra2 does not break the remaining support for IPD (I will be dropping IPD support in coming months but not just yet)

Let me know if you run into any issues with either test that you need help with. Thanks!

KateFriedman-NOAA commented 3 years ago

I was just told that it's known that waves don't work with CCPP on. Therefore test 1 can be done with DO_WAVE=NO and I am ok with DO_WAVE=NO in config.base when RUN_CCPP=YES.

yangfanglin commented 3 years ago

If the changes Anning made to the model is based on the latest develop branch, IPD has been removed from the code. He can only test with RUN_CCPP=YES.

The workflow supporting GFS.v16 and IPD should be tagged. It is probably time to remove all IPD related definitions in the workflow and move to only support CCPP as soon as possible since all developments (physics and coupled model etc) will be using CCPP.

Fanglin

On Wed, Feb 3, 2021 at 11:28 AM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA For the Merra2 changes please run 2 tests on each platform:

Test 1 - combine several settings into a single 2.5 cycle cycled test:

  1. interval=24 (gfs_cyc=1), make sure one of the full cycles in the test is 00z (suggest starting with 18z half cycle), feel free to set gfs_cyc=4 when running setup scripts though if there are no reasons to not run the gfs for 06z, 12z, or 18z
  2. FHMAX_GFS_00=384 (to make sure you don't hit walltime)
  3. DO_WAVE=YES

That should invoke the parts of the system I need to see tested for this.

Test 2 - a 1.5 cycle cycled test on each platform with RUN_CCPP=NO to make sure adding support for Merra2 does not break the remaining support for IPD (I will be dropping IPD support in coming months but not just yet)

Let me know if you run into any issues with either test that you need help with. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-772638606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKY5N2KI65763SQHM3QACQLS5F2SXANCNFSM4UICEJMA .

-- Fanglin Yang, Ph.D. Chief, Model Physics Group Modeling and Data Assimilation Branch

NOAA/NWS/NCEP Environmental Modeling Center

https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/ https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/

KateFriedman-NOAA commented 3 years ago

I was hoping to get the final v16 changes from NCO into develop before removing the final IPD support but with the implementation delay I see I can't do that...so yes I guess it's time to remove all IPD related definitions in the workflow.

I'll update my PR review comments for this. Thanks!

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA Ok since waves don't work with CCPP right now I adjust my test request: please run a 2.5 cycle cycled test on each platform with RUN_CCPP=YES, DO_WAVE=NO, and FHMAX_GFS_00=384 so we can make sure it is ok in cycled mode. No test #2 anymore. Thanks! Sorry for the confusion on my end!

AnningCheng-NOAA commented 3 years ago

HI, Kate:

Please let me know when you copy merra2 and aer_data to the main $FIX_DIR.

Thank you,

Anning

On Wed, Feb 3, 2021 at 1:01 PM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA Ok since waves don't work with CCPP right now I adjust my test request: please run a 2.5 cycle cycled test on each platform with RUN_CCPP=YES, DO_WAVE=NO, and FHMAX_GFS_00=384 so we can make sure it is ok in cycled mode. No test #2 https://github.com/NOAA-EMC/global-workflow/issues/2 anymore. Thanks! Sorry for the confusion on my end!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-772705856, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIP366VB7CSDTACPHP3S5GFQLANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA I've started pulling them into the WCOSS-Dell $FIX_DIR ($FIX_DIR/fix_aer and $FIX_DIR/fix_lut) from your WCOSS_Dell set. Then I will rsync them to the FIX_DIRs on the Crays, Hera, Jet, and Orion...and make a new HPSS tarball of $FIX_DIR. I see the fix_aer files are quite large so the copy/rsyncs will take a while. Will report back when done, thanks!

AnningCheng-NOAA commented 3 years ago

Hi, Kate:

I have just noticed that the MERRA2 data at DELL is dated back to 2017 and out of date. I have just update the dataset and you can see that the date produced is December 24th. I am afraid that you need to repull the data again. Sorry for any inconvenience.

Anning

On Thu, Feb 4, 2021 at 10:23 AM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA I've started pulling them into the WCOSS-Dell $FIX_DIR ($FIX_DIR/fix_aer and $FIX_DIR/fix_lut) from your WCOSS_Dell set. Then I will rsync them to the FIX_DIRs on the Crays, Hera, Jet, and Orion...and make a new HPSS tarball of $FIX_DIR. I see the fix_aer files are quite large so the copy/rsyncs will take a while. Will report back when done, thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-773389209, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIIM7JOUZUMBGRIGSLDS5K3U7ANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA I see your Mars set is now dated February 4th and the Venus set (what I pulled from) is December 24th. I should pull from your Mars set then? Please confirm, thanks!

AnningCheng-NOAA commented 3 years ago

Kate, I have just copied MERRA2 from DELL to Mars. So they should be consistent, exactly the same although the date is different.

On Thu, Feb 4, 2021 at 11:32 AM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA I see your Mars set is now dated February 4th and the Venus set (what I pulled from) is December 24th. I should pull from your Mars set then? Please confirm, thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-773438740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIODND2KCSF3UU6QDK3S5LD2BANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA The new fix files are now in all $FIX_DIRs on WCOSS-Dell, WCOSS-Cray, Hera, and rzdm. I'm copying them to Orion and Jet this morning. Below is the listing of them on Hera under $FIX_DIR/fix_aer and $FIX_DIR/fix_lut. I'm also putting a fresh copy of $FIX_DIR on HPSS for our archival. You can now remove the paths to FIX_AER and FIX_LUT in config.base.emc.dyn, thanks.

-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/
total 108
-rwxr-xr-x  1 glopara global   160 Oct  3  2019 0readme
drwxr-sr-x  2 glopara global  4096 Feb  4 17:05 fix_aer
drwxr-sr-x  5 glopara global 61440 Dec  2 18:18 fix_am
drwxr-sr-x  5 glopara global  4096 Jun 10  2019 fix_chem
drwxr-sr-x 10 glopara global  4096 Jul 28  2017 fix_fv3
drwxr-sr-x 10 glopara global  4096 Dec 31  2017 fix_fv3_gmted2010
drwxr-xr-x  6 glopara global  4096 Dec 13  2019 fix_gldas
drwxr-sr-x  2 glopara global  4096 Feb  4 15:37 fix_lut
drwxr-sr-x  2 glopara global  4096 Aug 31 14:11 fix_orog
drwxr-sr-x  2 glopara global  4096 Sep 13  2019 fix_sfc_climo
drwxr-sr-x  4 glopara global  4096 May 11  2018 fix_verif
drwxr-sr-x  2 glopara global  4096 Oct 26 14:59 fix_wave_gfs
-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_aer/
total 12693072
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:49 merra2.aerclim.2003-2014.m01.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:50 merra2.aerclim.2003-2014.m02.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:43 merra2.aerclim.2003-2014.m03.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:53 merra2.aerclim.2003-2014.m04.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:41 merra2.aerclim.2003-2014.m05.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:54 merra2.aerclim.2003-2014.m06.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:47 merra2.aerclim.2003-2014.m07.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:50 merra2.aerclim.2003-2014.m08.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:42 merra2.aerclim.2003-2014.m09.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:52 merra2.aerclim.2003-2014.m10.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:46 merra2.aerclim.2003-2014.m11.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb  4 15:44 merra2.aerclim.2003-2014.m12.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:52 merra2C.aerclim.2003-2014.m01.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:52 merra2C.aerclim.2003-2014.m02.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:46 merra2C.aerclim.2003-2014.m03.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:45 merra2C.aerclim.2003-2014.m04.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:54 merra2C.aerclim.2003-2014.m05.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:54 merra2C.aerclim.2003-2014.m06.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:47 merra2C.aerclim.2003-2014.m07.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:52 merra2C.aerclim.2003-2014.m08.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:45 merra2C.aerclim.2003-2014.m09.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:46 merra2C.aerclim.2003-2014.m10.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:45 merra2C.aerclim.2003-2014.m11.nc
-rwxr-xr-x 1 glopara global   64218936 Feb  4 15:45 merra2C.aerclim.2003-2014.m12.nc
-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_lut/
total 73428
-rwxr-xr-x 1 glopara global   202000 Jun 24  2019 optics_BC.v1_3.dat
-rwxr-xr-x 1 glopara global   461637 Jun 24  2019 optics_DU.v15_3.dat
-rwxr-xr-x 1 glopara global 73711072 Jun 24  2019 optics_DU.v15_3.nc
-rwxr-xr-x 1 glopara global   202000 Jun 24  2019 optics_OC.v1_3.dat
-rwxr-xr-x 1 glopara global   502753 Jun 24  2019 optics_SS.v3_3.dat
-rwxr-xr-x 1 glopara global   101749 Jun 24  2019 optics_SU.v1_3.dat
AnningCheng-NOAA commented 3 years ago

Kate, thanks! I submitted a cycling test (C768L127 for gfs and C384L127 for ensemble) in orion yesterday. It took a very long waiting time, Could you take a look to see if too much resources have been requested? expdir: /work/noaa/global/acheng/para_gfs/mcyc1, rotdir: /work/noaa/stmp/acheng/ROTDIRS/mcyc1

On Fri, Feb 5, 2021 at 9:15 AM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA The new fix files are now in all $FIX_DIRs on WCOSS-Dell, WCOSS-Cray, Hera, and rzdm. I'm copying them to Orion and Jet this morning. Below is the listing of them on Hera under $FIX_DIR/fix_aer and $FIX_DIR/fix_lut. I'm also putting a fresh copy of $FIX_DIR on HPSS for our archival. You can now remove the paths to FIX_AER and FIX_LUT in config.base.emc.dyn, thanks.

-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/ total 108 -rwxr-xr-x 1 glopara global 160 Oct 3 2019 0readme drwxr-sr-x 2 glopara global 4096 Feb 4 17:05 fix_aer drwxr-sr-x 5 glopara global 61440 Dec 2 18:18 fix_am drwxr-sr-x 5 glopara global 4096 Jun 10 2019 fix_chem drwxr-sr-x 10 glopara global 4096 Jul 28 2017 fix_fv3 drwxr-sr-x 10 glopara global 4096 Dec 31 2017 fix_fv3_gmted2010 drwxr-xr-x 6 glopara global 4096 Dec 13 2019 fix_gldas drwxr-sr-x 2 glopara global 4096 Feb 4 15:37 fix_lut drwxr-sr-x 2 glopara global 4096 Aug 31 14:11 fix_orog drwxr-sr-x 2 glopara global 4096 Sep 13 2019 fix_sfc_climo drwxr-sr-x 4 glopara global 4096 May 11 2018 fix_verif drwxr-sr-x 2 glopara global 4096 Oct 26 14:59 fix_wave_gfs -bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_aer/ total 12693072 -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:49 merra2.aerclim.2003-2014.m01.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:50 merra2.aerclim.2003-2014.m02.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:43 merra2.aerclim.2003-2014.m03.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:53 merra2.aerclim.2003-2014.m04.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:41 merra2.aerclim.2003-2014.m05.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:54 merra2.aerclim.2003-2014.m06.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:47 merra2.aerclim.2003-2014.m07.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:50 merra2.aerclim.2003-2014.m08.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:42 merra2.aerclim.2003-2014.m09.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:52 merra2.aerclim.2003-2014.m10.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:46 merra2.aerclim.2003-2014.m11.nc -rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:44 merra2.aerclim.2003-2014.m12.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m01.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m02.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:46 merra2C.aerclim.2003-2014.m03.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m04.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:54 merra2C.aerclim.2003-2014.m05.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:54 merra2C.aerclim.2003-2014.m06.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:47 merra2C.aerclim.2003-2014.m07.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m08.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m09.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:46 merra2C.aerclim.2003-2014.m10.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m11.nc -rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m12.nc -bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_lut/ total 73428 -rwxr-xr-x 1 glopara global 202000 Jun 24 2019 optics_BC.v1_3.dat -rwxr-xr-x 1 glopara global 461637 Jun 24 2019 optics_DU.v15_3.dat -rwxr-xr-x 1 glopara global 73711072 Jun 24 2019 optics_DU.v15_3.nc -rwxr-xr-x 1 glopara global 202000 Jun 24 2019 optics_OC.v1_3.dat -rwxr-xr-x 1 glopara global 502753 Jun 24 2019 optics_SS.v3_3.dat -rwxr-xr-x 1 glopara global 101749 Jun 24 2019 optics_SU.v1_3.dat

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-774058008, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIKS7BE7PFAISATJRJDS5P4QPANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

Anning, I took a look and your jobs have appropriate resource requests. My most recent C768C384L127 test on Orion used more resources since I ran with waves on so yours are fewer so it should be good. It's possible the queues are very busy and/or the compute account allocation you're using is nearing full. I see you're using fv3-cpu, it looks like it's close to its allocation (via saccount_params command):

        Project: fv3-cpu
                LevelFairshare=0.705    Core Hours Used (30 days)=2634324.7,30-day Allocation=2812246
                Partition Access: ALL
                Available QOSes: batch,debug,novel,urgent,windfall

You could try another compute account if you have access to another (check via saccount_params command) but if the queues are busy you'll keep waiting.

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA @yangfanglin FYI after discussing the upcoming commit plan for develop with the other global-workflow code managers we have decided we are going to hold this work and PR #254 for a bit (~2-3 weeks). Since this PR moves the ufs-weather-model version forward to one that supports hpc-stack we want to get the other hpc-stack changes into develop before this. Please complete current testing and keep your branch synced with develop changes. You may leave the PR open. Thanks!

AnningCheng-NOAA commented 3 years ago

HI, Kate:

I am running the cycling test at Hera. There is a error: "aircftbias_in not found" at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/logs/2020020406/gdaseobs.log. Exp dir: /scratch2/NCEPDEV/climate/Anning.Cheng/para_gfs/mcyc

Do you have any idea to fix the problem? Thanks!

Anning

On Tue, Feb 9, 2021 at 11:59 AM Kate Friedman notifications@github.com wrote:

@AnningCheng-NOAA https://github.com/AnningCheng-NOAA @yangfanglin https://github.com/yangfanglin FYI after discussing the upcoming commit plan for develop with the other global-workflow code managers we have decided we are going to hold this work and PR #254 https://github.com/NOAA-EMC/global-workflow/pull/254 for a bit (~2-3 weeks). Since this PR moves the ufs-weather-model version forward to one that supports hpc-stack we want to get the other hpc-stack changes into develop before this. Please complete current testing and keep your branch synced with develop changes. You may leave the PR open. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-776086739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMINWLYHKTPPLLFEYBTDS6FSX5ANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

You need some IC files for the analysis, the missing file is one of them. This is missing:

/scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air

Where did you get your ICs? You'll need to pull out the following files that came from the same or a companion tarball:

Point me to your IC source and I'll see where those four files are. Thanks!

AnningCheng-NOAA commented 3 years ago

HI, Kate:

my ICs is at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/. I used UFS_UTILS to produce the cold start ICs. has the doc to produce the ICs changed?

Anning

On Tue, Feb 9, 2021 at 1:25 PM Kate Friedman notifications@github.com wrote:

You need some IC files for the analysis, the missing file is one of them. This is missing:

/scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air

Where did you get your ICs? You'll need to pull out the following files that came from the same or a companion tarball:

  • gdas.t00z.abias
  • gdas.t00z.abias_air
  • gdas.t00z.abias_pc
  • gdas.t00z.radstat

Point me to your IC source and I'll see where those four files are. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-776144087, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIIRP7P7P257V7ZTDQTS6F4YZANCNFSM4UICEJMA .

AnningCheng-NOAA commented 3 years ago

I have just found those files at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00, but do not know if they are at the right path.

On Tue, Feb 9, 2021 at 1:32 PM Anning Cheng - NOAA Affiliate < anning.cheng@noaa.gov> wrote:

HI, Kate:

my ICs is at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/. I used UFS_UTILS to produce the cold start ICs. has the doc to produce the ICs changed?

Anning

On Tue, Feb 9, 2021 at 1:25 PM Kate Friedman notifications@github.com wrote:

You need some IC files for the analysis, the missing file is one of them. This is missing:

/scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air

Where did you get your ICs? You'll need to pull out the following files that came from the same or a companion tarball:

  • gdas.t00z.abias
  • gdas.t00z.abias_air
  • gdas.t00z.abias_pc
  • gdas.t00z.radstat

Point me to your IC source and I'll see where those four files are. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-776144087, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIIRP7P7P257V7ZTDQTS6F4YZANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

Ah I see you have the files there, they are just not in the atmos folder:

-bash-4.2$ ll /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/
total 3286744
drwxr-xr-x 5 Anning.Cheng climate      20480 Feb  9 16:31 atmos
-rw-r--r-- 1 Anning.Cheng stmp        859665 Feb  9 07:24 gdas.t00z.abias
-rw-r--r-- 1 Anning.Cheng stmp       1082939 Feb  9 07:24 gdas.t00z.abias_air
-rw-r--r-- 1 Anning.Cheng stmp        859665 Feb  9 07:24 gdas.t00z.abias_int
-rw-r--r-- 1 Anning.Cheng stmp        917490 Feb  9 07:24 gdas.t00z.abias_pc
-rw-r--r-- 1 Anning.Cheng stmp             0 Feb  9 07:24 gdas.t00z.loginc.txt
-rwxr-x--- 1 Anning.Cheng stmp    3361832960 Feb  9 07:32 gdas.t00z.radstat

Move all of those files (abias, abias_air, abias_int, abias_pc, loginc.txt, radstat) down into that atmos folder. Then retry your failed jobs.

AnningCheng-NOAA commented 3 years ago

HI, Kate:

The cycling test of the workflow works well at Hera. But encounter a mpi error from GSI at Orion: /work/noaa/stmp/acheng/ROTDIRS/mcyc/logs/2020020406/gdasanal.log The exp dir: /work/noaa/global/acheng/para_gfs/mcyc.

Could you take a look and find if anything is missing?

Thank you!

Anning

On Tue, Feb 9, 2021 at 1:35 PM Kate Friedman notifications@github.com wrote:

Ah I see you have the files there, they are just not in the atmos folder:

-bash-4.2$ ll /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/ total 3286744 drwxr-xr-x 5 Anning.Cheng climate 20480 Feb 9 16:31 atmos -rw-r--r-- 1 Anning.Cheng stmp 859665 Feb 9 07:24 gdas.t00z.abias -rw-r--r-- 1 Anning.Cheng stmp 1082939 Feb 9 07:24 gdas.t00z.abias_air -rw-r--r-- 1 Anning.Cheng stmp 859665 Feb 9 07:24 gdas.t00z.abias_int -rw-r--r-- 1 Anning.Cheng stmp 917490 Feb 9 07:24 gdas.t00z.abias_pc -rw-r--r-- 1 Anning.Cheng stmp 0 Feb 9 07:24 gdas.t00z.loginc.txt -rwxr-x--- 1 Anning.Cheng stmp 3361832960 Feb 9 07:32 gdas.t00z.radstat

Move all of those files (abias, abias_air, abias_int, abias_pc, loginc.txt, radstat) down into that atmos folder. Then retry your failed jobs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-776150388, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIIXSHUCSJPKSSPNPPTS6F56NANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

@AnningCheng-NOAA The cause of the error isn't jumping out at me, we usually see that error in the forecast jobs and not the analysis. @CatherineThomas-NOAA @CoryMartin-NOAA would you mind taking a look at Anning's failed analysis job on Orion? See log below. He is testing the system after adding support for MERRA2. Thanks!

/work/noaa/stmp/acheng/ROTDIRS/mcyc/logs/2020020406/gdasanal.log

CoryMartin-NOAA commented 3 years ago

I took a look, not totally sure but it seems like there is a problem reading the netCDF surface forecast files. Is there anything different in the sfcfNNN.nc files in this run than in a standard version? Are you able to rerun the gdasfcst from the previous cycle and try it again? This looks like the error we were having before where the model would write out 'bad' netCDF files that were then unreadable by GSI.

CatherineThomas-NOAA commented 3 years ago

I was just getting ready to say the same thing. The values of tref in the sfcfNNN.nc files look reasonable at least. @KateFriedman-NOAA does Orion have similar netCDF problems as Hera?

KateFriedman-NOAA commented 3 years ago

does Orion have similar netCDF problems as Hera?

@CatherineThomas-NOAA Not a frequently as Hera but yes. I looked back at my Orion runs since last May and found HDF errors in the efcs jobs of a CCPP run (last November) and in analysis jobs while I was testing port2orion last June. No HDF errors in any of the short cycled runs I've done since then. I'm starting to test the full system using hpc-stack so I'm keeping my eye out for these errors on both machines.

AnningCheng-NOAA commented 3 years ago

Hi, Kate, There is no this error at Hera. I am rerunning the forecast to see if this error insists as suggested by Cory and Catherine.

On Thu, Feb 18, 2021 at 12:03 PM Kate Friedman notifications@github.com wrote:

does Orion have similar netCDF problems as Hera?

@CatherineThomas-NOAA https://github.com/CatherineThomas-NOAA Not a frequently as Hera but yes. I looked back at my Orion runs since last May and found HDF errors in the efcs jobs of a CCPP run (last November) and in analysis jobs while I was testing port2orion last June. No HDF errors in any of the short cycled runs I've done since then. I'm starting to test the full system using hpc-stack so I'm keeping my eye out for these errors on both machines.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-781492126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMII4M5HJYIMNEFKPTCTS7VB6VANCNFSM4UICEJMA .

AnningCheng-NOAA commented 3 years ago

All, the error is still there by rerunning the forecasts. I made a run without MERRA2 and still get the same error: /work/noaa/stmp/acheng/ROTDIRS/mcyco/logs/2020020406/gdasanal.log. The rundir is /work/noaa/global/acheng/para_gfs/mcyco. Kate, how is your test going?

On Thu, Feb 18, 2021 at 12:15 PM Anning Cheng - NOAA Affiliate < anning.cheng@noaa.gov> wrote:

Hi, Kate, There is no this error at Hera. I am rerunning the forecast to see if this error insists as suggested by Cory and Catherine.

On Thu, Feb 18, 2021 at 12:03 PM Kate Friedman notifications@github.com wrote:

does Orion have similar netCDF problems as Hera?

@CatherineThomas-NOAA https://github.com/CatherineThomas-NOAA Not a frequently as Hera but yes. I looked back at my Orion runs since last May and found HDF errors in the efcs jobs of a CCPP run (last November) and in analysis jobs while I was testing port2orion last June. No HDF errors in any of the short cycled runs I've done since then. I'm starting to test the full system using hpc-stack so I'm keeping my eye out for these errors on both machines.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-781492126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMII4M5HJYIMNEFKPTCTS7VB6VANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

My run of the system (feature/hpc-stack) on Hera using components all building with hpc-stack was successful. I did not see any HDF5 errors...but I only ran 2.5 cycles so far. The GSI master doesn't yet support hpc-stack on other machines so I can't perform the same test on Orion yet.

@CatherineThomas-NOAA @CoryMartin-NOAA Is there a GSI branch with stack support for Orion that I can try? Thanks!

RussTreadon-NOAA commented 3 years ago

Please see NOAA-EMC/GSI issue #110 https://github.com/NOAA-EMC/GSI/issues/110 for the status of hpc-stack on non-Hera platforms. In addition to Hera, hpc-stack builds now exist for WCOSS_D, Orion, and Jet. This is beta development. Use at your own risk. No support is provided if you encounter problems.

On Mon, Feb 22, 2021 at 9:40 AM Kate Friedman notifications@github.com wrote:

My run of the system (feature/hpc-stack) on Hera using components all building with hpc-stack was successful. I did not see any HDF5 errors...but I only ran 2.5 cycles so far. The GSI master doesn't yet support hpc-stack on other machines so I can't perform the same test on Orion yet.

@CatherineThomas-NOAA https://github.com/CatherineThomas-NOAA @CoryMartin-NOAA https://github.com/CoryMartin-NOAA Is there a GSI branch with stack support for Orion that I can try? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783421370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNN635K6RLKW7UJD7ECZ6DTAJUFTANCNFSM4UICEJMA .

KateFriedman-NOAA commented 3 years ago

Thanks @RussTreadon-NOAA ! I'll try that branch on Orion and WCOSS-Dell to test global-workflow feature/hpc-stack.

RussTreadon-NOAA commented 3 years ago

Ran stand-along GSI script in 3dvar mode on Orion using Anning's files for 2020020406 gdas case. global_gsi.x built from NOAA-EMC/GSI master and forked hpc-stack branch ran to completion. Given this, submit stand-alone GSI script using NOAA-EMC/GSI master in 4denvar mode. Job is pending in the batch queue. Anning's run uses NOAA-EMC/GSI tag gfsda.v16.0.0. Could try checking out master in /work/noaa/global/acheng/gfsv16_ccpp/sorc/gsi.fd and rebuilding DA using /work/noaa/global/acheng/gfsv16_ccpp/sorc/build_gsi.sh.

On Mon, Feb 22, 2021 at 9:52 AM Kate Friedman notifications@github.com wrote:

Thanks @RussTreadon-NOAA https://github.com/RussTreadon-NOAA ! I'll try that branch on Orion and WCOSS-Dell to test global-workflow feature/hpc-stack.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783429517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNN632BCNXJGP3X3QDCSLLTAJVTZANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

Note that /work/noaa/global/acheng/gfsv16_ccpp/modulefiles/module_base.orion loads

module use /apps/contrib/NCEPLIBS/orion/modulefiles
module load hdf5_parallel/1.10.6
module use /apps/contrib/NCEPLIBS/lib/modulefiles
module load netcdfp/4.7.4

when it executes gdasanal.

In contrast, /work/noaa/global/acheng/gfsv16_ccpp/sorc/gsi.fd builds DA with

module use /apps/contrib/NCEPLIBS/lib/modulefiles
module load netcdfp/4.7.4.release

The NOAA-EMC/GSI master also builds DA on Orion using netcdfp/4.7.4.release.

Might the difference between the workflow build and run modules cause problems?

RussTreadon-NOAA commented 3 years ago

FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a global_gsi.x built from NOAA-EMC/GSI tag release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion found in gsi.fd/modulefiles.

AnningCheng-NOAA commented 3 years ago

Russ, glad to know. I will rebuild gsi and make a try. Where is your run dir and submit dir?

On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA notifications@github.com wrote:

FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a global_gsi.x built from NOAA-EMC/GSI tag release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion found in gsi.fd/modulefiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783788648, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts. File gsi1534.o1343532 in the same directory is the job log file. The job ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I submitted the script again using your global_gsi.x and your fix. This job is waiting in the queue due to today's (2/23) maintenance. If this job runs OK, the investigation shifts to the workflow side of things. I wonder about the module mismatch. The modules the workflow loads to run the gfsda.v16.0.0 global_gsi.x are not the same modules used to build global_gsi.x. I guess I could mimic this mismatch in rungsi1534L127_debug.sh and see if the global_gsi.x then fails.

On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA notifications@github.com wrote:

Russ, glad to know. I will rebuild gsi and make a try. Where is your run dir and submit dir?

On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA <notifications@github.com

wrote:

FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a global_gsi.x built from NOAA-EMC/GSI tag release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion found in gsi.fd/modulefiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783788648 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784230592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA .

AnningCheng-NOAA commented 3 years ago

Russ, I did rebuild global_gsi.x using the netcdfp/4.7.4 (in gfsv16_ccpp/modulefiles/module_base.orion) and tried yesterday, but was not successful.

On Tue, Feb 23, 2021 at 9:31 AM RussTreadon-NOAA notifications@github.com wrote:

I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts. File gsi1534.o1343532 in the same directory is the job log file. The job ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I submitted the script again using your global_gsi.x and your fix. This job is waiting in the queue due to today's (2/23) maintenance. If this job runs OK, the investigation shifts to the workflow side of things. I wonder about the module mismatch. The modules the workflow loads to run the gfsda.v16.0.0 global_gsi.x are not the same modules used to build global_gsi.x. I guess I could mimic this mismatch in rungsi1534L127_debug.sh and see if the global_gsi.x then fails.

On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA <notifications@github.com

wrote:

Russ, glad to know. I will rebuild gsi and make a try. Where is your run dir and submit dir?

On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA < notifications@github.com

wrote:

FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a global_gsi.x built from NOAA-EMC/GSI tag release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion found in gsi.fd/modulefiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783788648

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784230592 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784243163, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMINVZJXVS2GKT2QKEBDTAO34JANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

I don't think this is the correct approach. The tests I ran indicate that the gfsda.v16.0.0 build is OK. The workflow appears to be the problem, not gfsda.v16.0.0. The modules the workflow loads when it runs anal differ from the modules used to build global_gsi.x. You should try a test in which you build global_gsi.x with the gfsda.v16.0.0 modules as is and then modify the workflow modules. This is only a test since changing the workflow modules may break other apps.

On Tue, Feb 23, 2021 at 9:39 AM AnningCheng-NOAA notifications@github.com wrote:

Russ, I did rebuild global_gsi.x using the netcdfp/4.7.4 (in gfsv16_ccpp/modulefiles/module_base.orion) and tried yesterday, but was not successful.

On Tue, Feb 23, 2021 at 9:31 AM RussTreadon-NOAA <notifications@github.com

wrote:

I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts. File gsi1534.o1343532 in the same directory is the job log file. The job ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I submitted the script again using your global_gsi.x and your fix. This job is waiting in the queue due to today's (2/23) maintenance. If this job runs OK, the investigation shifts to the workflow side of things. I wonder about the module mismatch. The modules the workflow loads to run the gfsda.v16.0.0 global_gsi.x are not the same modules used to build global_gsi.x. I guess I could mimic this mismatch in rungsi1534L127_debug.sh and see if the global_gsi.x then fails.

On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA < notifications@github.com

wrote:

Russ, glad to know. I will rebuild gsi and make a try. Where is your run dir and submit dir?

On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA < notifications@github.com

wrote:

FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a global_gsi.x built from NOAA-EMC/GSI tag release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion found in gsi.fd/modulefiles.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-783788648

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784230592

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784243163 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ALQPMINVZJXVS2GKT2QKEBDTAO34JANCNFSM4UICEJMA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-784248426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNN637OHQX3NRSJBU7UOLDTAO42LANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

The following test has been run on Orion.

The job successfully ran up to the specified 1 hour wall clock limit. The global_gsi.x was 2/3 of the way through the second outer loop when the system killed the job. No netcdf or hdf5 errors in job log file.

Anning's run used 125 nodes for gdasanal. I reverted back to this, regenerated the xml, and resubmitted the 2021020406 gdasanal. The job is waiting in the queue.

AnningCheng-NOAA commented 3 years ago

Russ, and all: I merged the workflow with the latest version, recompiled the code, submitted mcyc, but has not resubmitted mcyco. The workflow is running well now for both MERRA2 (mcyc) and OPAC (mcyco). I guess it is due to the merging. Thank you! your comments are welcome.

On Wed, Feb 24, 2021 at 10:19 AM RussTreadon-NOAA notifications@github.com wrote:

The following test has been run on Orion.

  • copy "/work/noaa/global/acheng/para_gfs/mcyco" to "/work/noaa/da/Russ.Treadon/para_gfs/mcyco". Update to run under in my PTMP using acheng HOMEgfs
  • populate "/work/noaa/stmp/rtreadon/ROTDIRS/mcyco" with files from "/work/noaa/stmp//acheng/ROTDIRS/mcyco"
  • rocotorewind and rocotoboot 2021020406 gdasanal. Job requested 125 nodes with lengthy estimated queue wait time so scancel and reduce analysis job to 50 nodes and resubmit

The job successfully ran up to the specified 1 hour wall clock limit. The global_gsi.x was 2/3 of the way through the second outer loop when the system killed the job. No netcdf or hdf5 errors in job log file.

Anning's run used 125 nodes for gdasanal. I reverted back to this, regenerated the xml, and resubmitted the 2021020406 gdasanal. The job is waiting in the queue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-785149485, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMIOS3NLVYGXIVVAKS5TTAUKIHANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

My rerun of mcyco gdasanal for 2020020406 using 125 nodes ran overnight without any errors. The previous 50 node job was terminated after hitting to one hour wall clock. Based on minimization stats in the log file it was reproducing the output from the 125 node job. This makes sense. GSI results do not vary with task count. The queue wait time for a 50 node job is less than a 125 node job. You should examine resource settings in your parallel. You might get better throughput if you reduce the node (task) count and appropriately increase the wall clock limit.

Based on your comments, Anning, it seems the gdasanal problem was not DA but something in the workflow or compilation. Is this correct?

AnningCheng-NOAA commented 3 years ago

Yes, that is correct.

On Thu, Feb 25, 2021 at 7:15 AM RussTreadon-NOAA notifications@github.com wrote:

My rerun of mcyco gdasanal for 2020020406 using 125 nodes ran overnight without any errors. The previous 50 node job was terminated after hitting to one hour wall clock. Based on minimization stats in the log file it was reproducing the output from the 125 node job. This makes sense. GSI results do not vary with task count. The queue wait time for a 50 node job is less than a 125 node job. You should examine resource settings in your parallel. You might get better throughput if you reduce the node (task) count and appropriately increase the wall clock limit.

Based on your comments, Anning, it seems the gdasanal problem was not DA but something in the workflow or compilation. Is this correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/200#issuecomment-785852838, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALQPMII4OLJOH7LJKYNWL6TTAY5N3ANCNFSM4UICEJMA .

RussTreadon-NOAA commented 3 years ago

Thanks for the confirmation. I'll stand down on this issue.

KateFriedman-NOAA commented 3 years ago

PR #254 has been submitted and has closed this issue. Thank you @AnningCheng-NOAA for this addition and thank you @lgannoaa for testing/reviewing! Will send announcement to glopara listserv shortly.