NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
192 stars 143 forks source link

Update WRF-DART interface to support WRF4 #673

Closed braczka closed 4 months ago

braczka commented 5 months ago

Description:

Make WRF-DART scripting compatible with WRF4 as outlined in Issue #661. Also provides updates to documentation in WRF tutorial and main model page, advising of this change, and describing limits to backward compatibility to WRFv3.9 and earlier.

Fixes issue

Fixes #661 fixes #672 fixes #680

Types of changes

Documentation changes needed?

Tests

Performed testing against WRF-DART Tutorial example as outlined in PR #650

Checklist for merging

Checklist for release

Testing Datasets

hkershaw-brown commented 5 months ago

nice stuff @braczka

braczka commented 5 months ago

Just remembered -- I still need to make minor updates to the tutorial tar.gz file, and documentation:

cd $BASE_DIR
wget http://www.image.ucar.edu/wrfdart/tutorial/wrf_dart_tutorial_23May2018_v3.tar.gz
tar -xzvf wrf_dart_tutorial_23May2018_v3.tar.gz

The perturbations, surface file and initial conditions were generated from a pre WRF4 version, but we can still use the same data to run the tutorial example. I just need to add a few more lines of documentation so the user is not confused. Will also update README files in the tar.gz file.

braczka commented 5 months ago

Just to document our conversation from DART standup today ... my testing of these changes have been limited to the tutorial test case example as outlined in PR https://github.com/NCAR/DART/pull/650, where the results looked similar to the original WRFv3.9 test case. A thorough evaluation of the impact of these changes upon forward operators (i.e. radio occultation, radar etc) has not been done. We may need to rely on our user-base to test these impacts....

hkershaw-brown commented 4 months ago

ok one last comment I promise, do you want to fix #672 in this pull request? (update the wrf/work/input.nml)

braczka commented 4 months ago

ok one last comment I promise, do you want to fix #672 in this pull request? (update the wrf/work/input.nml)

Yes, I should include #672, and probably #660 as well, to clean up those two issues.

braczka commented 4 months ago

I seem to be getting a new error since the last shutdown of Derecho. I have been getting the following error when executing step ./driver.csh 2017042706 param.csh >& run.out &. The error is related to the adding of perturbations to the wrfinput file as:

host is  dec2323
assim_advance.csh is running in /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
new_advance_model.csh is running in /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
/glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
use wrfvar set
stuff var  U
wrf.info is read
1
/glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/WRF/wrfbdy_d01_152057_43200_mean
Error! Non-zero status returned from add_bank_perts.ncl. Check /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/advance_temp1/add_perts.err.
warning:_NclOpenFile: Can not open file <wrfvar_output>; file format not supported or file is corrupted
^Mfatal:file (wrf_in) isn't defined
^Mfatal:["Execute.c":8637]:Execute: Error occurred at or near line 55 in file /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/advance_temp1/add_bank_perts.ncl

^Mduration = 13

The source of the issue seems to be that two calls are made to assim_advance.csh for ensemble member 1, diagnosed by two log files that are created:

-rw-r--r-- 1 bmraczka ncar 858 May  9 16:22 assim_advance_1.o4413480
-rw-r--r-- 1 bmraczka ncar 858 May  9 16:22 assim_advance_1.o4413479

Scripts operating simultaneously seem to compete for access to wrfvar_output file leading to error. Solution seems to be adding some sleep commands within driver.csh script, so script can detect that the assim_advance.csh has been started before generating a second. Will include this commit within this PR.

mgharamti commented 4 months ago

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh)

You can add this in your submission script right after the PBS preamble.

braczka commented 4 months ago

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh)

You can add this in your submission script right after the PBS preamble.

Hmmm -- not sure, but I will test that too !

braczka commented 4 months ago

@hkershaw-brown I think I have addressed all of your concerns. Some additional changes we discussed at standup are not optimal (adding scripting pauses to csh scripts). A more robust fix likely would require more substantial refactor. I will meet with WRF users from EOL and RAL to better address need for refactor. However, this PR fix for hybrid coordinate system and T to THM switch are important to get to community.

braczka commented 4 months ago

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh) You can add this in your submission script right after the PBS preamble.

Hmmm -- not sure, but I will test that too !

@mgharamti This PALS_CPU_BIND variable did not seem to influence the WRF simulation. I was also going to test this for slow performance with CLM-DART, but that appears to be related to compression of large data files related to campaign migration.