Open emilyhcliu opened 8 months ago
@CoryMartin-NOAA @RussTreadon-NOAA
The GSI branch (see above) in this PR will be used with the end-to-end code sprint.
The CRTM 2.4.1-jedi.1 which is consistent with the GDASApp is used in the GSI.
Please see the description above for more details.
Thank you @emilyhcliu for creating a GSI fork which can use CRTM-2.4.1-jedi.1
.
I updated a working copy of feature/gdasapp-sprint
to clone the forked GSI-crtm_v2.4.1-jedi.1
. CRTM_FIX
has been defined to point at /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian
in gdas_config/config.anal
and gdas_config/config.atmanl
.
I am currently working through the sequence of steps to clone, build, setup, and run jobs.
@emilyhcliu , where my I find the run script you used to test the gsi.x
which you built with CRTM-2.4.1-jedi.1
? I'm encountering crtm library errors when I execute gsi.x
from g-w.
@RussTreadon-NOAA is the issue a shared library is missing? I think, if so, then $LD_LIBRARY_PATH
needs modified at runtime.
@CoryMartin-NOAA , yes the initial problem was the shared library. Thank you for the pointer. I added the crtm path to LD_LIBRARY_PATH
. The updated config.anal
now has two additions:
export CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/lib
With the addition of LD_LIBRARY_PATH
executable gsi.x
starts running. Execution, however, aborts with
55: SpcCoeff_ReadFile(Binary)(FAILURE) : Error reading channel data. input statement requires too much data, unit 10, file /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_GSI/anal.101081/./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
55: CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
55: READ_BUFRTOVS: ***ERROR*** crtm_spccoeff_load error_status= 3
55: despite file ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
55: existing, TERMINATE PROGRAM EXECUTION
A check of ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
shows that this local file is correctly linked to @emilyhcliu ' s little endian fix.
Orion-login-4:/work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_GSI/anal.101081$ ls -l ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
lrwxrwxrwx 1 rtreadon stmp 92 Oct 19 16:42 ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin -> /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian/amsua_metop-b.SpcCoeff.bin
Orion-login-4:/work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_GSI/anal.101081$ ls -lL ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
-rw-r----- 1 eliu da 12196 Oct 13 13:34 ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
The log file contains 690927 lines of AntCorr
printout. Of these, 663809 lines are Apply_AntCorr(FAILURE) : Input iFOV inconsistent with AC data
. 27118 lines are Remove_AntCorr(FAILURE) : Input iFOV inconsistent with AC data
. This printout is not present when I build and run gsi.x
with crtm/2.4.0
.
I built gsi.x
with BUILD_VERBOSE=YES
. I see Emily's library modules being included. This is good. Something potentially not good is -convert big_endian
in the compiler options. Building with -convert big_endian
& trying to read little endian seems problematic.
It would be helpful to examine a gsi.x
build with the new CRTM along with a run script.
I found the following in the EMC JEDI Discussions google space
(2) need to remove HIRS4 from GSI obs namelist. There is a problem reading the HIRS4 coefficients. We do not use HIRS4, so I removed the HIRS4 from the obs namelist
This is a g-w change since EID version controls exglobal_atmos_analysis.sh
. I removed HIRS4 from my working copy of the script and reran 2021080100 gdasanal. gsi.x
still aborts.
@emilyhcliu , where my I find the run script you used to test the
gsi.x
which you built withCRTM-2.4.1-jedi.1
? I'm encountering crtm library errors when I executegsi.x
from g-w.
@RussTreadon-NOAA I was out this morning to see dentist.
You can find the scripts I used to run GSI in the following directory on ORION:
/work/noaa/da/eliu/git/GSI-emilyhcliu/GSI/scripts/gsi
There are three scripts (provided by Cory for our previous code sprint) gsi_observer.sh iodaconv.sh submit_gsi_observer.sh
You just need to modify the path to GSI in submit_gsi_observer.sh
and submit the script. It will trigger gsi_observer.sh
ps. I already turned off iodaconv.sh.
Thank you @emilyhcliu for pointing me at the scripts you use to run gsi.x
. As a first step let me try my gsi.x
with you scripts.
@emilyhcliu , this is very odd. Using my gsi.x
in your script fails in the same was as running it from g-w. I took a step back and used your gsi.x
. Same failure. I recopied your gsi_observer.sh and submit_gsi_observer.sh to my space and resubmitted. Same failure. This suggests that something in my Orion environment differs from your environment.
@CoryMartin-NOAA , have you run gsi.x
built with CRTM-2.4.1-jedi.1 using little endian coefficients?
@RussTreadon-NOAA no I have not, I thought you had to use big endian, since presumably the GSI is compiled with big endian and then the BERROR_STATS file, and others, will also be big endian.
@RussTreadon-NOAA no I have not, I thought you had to use big endian, since presumably the GSI is compiled with big endian and then the BERROR_STATS file, and others, will also be big endian.
Agreed! The gsi code is compiled with big endian compiler flags. The GSI static-B file is big endian. The CRTM coefficients being provided to gsi.x
in the above runs are little endian. The endianness mismatch seems to be the problem.
The EMC JEDI Discussions note and comments above indicate that we need to use little endian coefficients
Cory Martin - NOAA Federal Andrew Collard - NOAA Federal Good news. We made the crtm v2.4.1-jedi.1 compiled with GSI develop with Cory's fix in CMakeList for GSI. I tested a single cycle (2021080100) and found the following: (1) need to use Little Endian coefficients (2) need to remove HIRS4 from GSI obs namelist. There is a problem reading the HIRS4 coefficients. We do not use HIRS4, so I removed the HIRS4 from the obs namelist
Change CRTM_FIX
from /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian
to /work2/noaa/da/cmartin/GDASApp/fix/crtm/2.4.0
.
With this change gsi.x
ran to completion ... but it took a long time (1531.006869 seconds) with a huge gdasatmanal.log (82 Mb, 1719951 lines). Many AntCorr(FAILURE)
and Using 5 OpenMP threads = 1 for profiles and
lines are written to the log file.
@RussTreadon-NOAA @emilyhcliu can we use /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Big_Endian
?
@RussTreadon-NOAA @emilyhcliu can we use
/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Big_Endian
?
Good suggestion. I tried. gsi.x
aborted with a byte-swapped
error message
63: Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
59: Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
63: Open_Binary_File(FAILURE) : Error checking ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin file byte order
50: Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
@RussTreadon-NOAA @CoryMartin-NOAA
I added the Big_Endian files for crtm-v2.4.0_emc.3. So, we have big and little endian coefficient files for crtm-v2.4.0_emc.3 and crtm_v2.4.1-jedi.1
CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Big_Endian CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Big_Endian
The crtm_v2.4.1-jedi-1 fix files have j2
in the filename for NOAA-21 instruments.
The crtm-v2.4.0_emc.3-fix has n21
in the filename for NOAA-21 instruments.
We want n21 in the filename. And, we also use amsua_metop-a_v2.SpcCoeff.bin in our operational GFS. N21 and the amsua_metop-a_v2 files are in crtm-v2.4.0_emc.3-fix only.
I suggest we use the coefficients from crtm-v2.4.0_emc.3-fix.
@emilyhcliu , unfortunately, setting export CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Big_Endian
in config.anal
did not result in a successful gsi.x
run.
The executable aborted with the previously mentioned byte-swapped error. Here's the error message from a representative task, 36
36: Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
36: Open_Binary_File(FAILURE) : Error checking ./crtm_coeffs/cris-fsr_npp.SpcCoeff.bin file byte order
36: SpcCoeff_ReadFile(Binary)(FAILURE) : Error opening ./crtm_coeffs/cris-fsr_npp.SpcCoeff.bin
36: CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/cris-fsr_npp.SpcCoeff.bin
36: READ_CRIS: ***ERROR*** crtm_spccoeff_load error_status= 3
36: TERMINATE PROGRAM EXECUTION
36: Abort(71) on node 36 (rank 36 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 71) - process 36
36: In: PMI_Abort(71, application called MPI_Abort(MPI_COMM_WORLD, 71) - process 36)
Reverting gsi.x
back to NOAA-EMC/GSI develop
at f76d8728
runs to completion when using CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Big_Endian
.
Perhaps the issue is with the gsi.x
executable built from feature/GSI-crtm_v2.4.1-jedi.1
Can we ask the library team to install CRTM 2.4.1-jedi.1 on Orion?
Then we could directly load crtm/2.4.1-jedi.1
from gsi_orion.lua
. Additionally, the crtm/2.4.1-jedi.1
module would define CRTM_FIX
, thereby removing the need to redefine CRTM_FIX
in config.anal
.
Just a thought.
@emilyhcliu and @CoryMartin-NOAA , I will pause work on this issue until we have a clear path forward.
This is very strange, why would it work for @emilyhcliu but not you, @RussTreadon-NOAA . Am I correct in understanding we seem to get the same error regardless of big or little endian coefficients?
Yes, @CoryMartin-NOAA , your understanding is correct.
Given your comment, I did the following this morning
Rebuild feature/GSI-crtm_v2.4.1-jedi.1
in /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gsi_enkf.fd/
using ush/build.sh
. I set BUILD_VERBOSE=YES
prior to executing build.sh
. File build.log
in the ush
directory captured the build. gsi.x
was built using modules from /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/module/crtm/Intel/2021.5.0.20211109
. Library /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/lib/libcrtm.so
was linked.
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/lib
was added to config.anal
in /work2/noaa/da/rtreadon/gdas-validation/expdir/gdas_eval_satwind_GSI/
.
.run_job.sh -c config_gsi.sh -t gdasanal
was executed for the following CRTM_FIX
(toggled in config.anal
) with the indicated results
With CRTM_FIX=work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Little_Endian
the GSI aborts with
101: SpcCoeff_ReadFile(Binary)(FAILURE) : Error reading channel data. input statement requires too much data, unit 10, file /work/noaa/stmp/rtreadon/RUNDIRS/gdas_eval_satwind_GSI/anal.34500/./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
101: CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
101: READ_BUFRTOVS: ***ERROR*** crtm_spccoeff_load error_status= 3
101: despite file ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
101: existing, TERMINATE PROGRAM EXECUTION
101: Abort(71) on node 101 (rank 101 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 71) - process 101
101: In: PMI_Abort(71, application called MPI_Abort(MPI_COMM_WORLD, 71) - process 101)
With CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Big_Endian
the GSI aborts with
13: Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
13: Open_Binary_File(FAILURE) : Error checking ./crtm_coeffs/iasi_metop-a.SpcCoeff.bin file byte order
13: SpcCoeff_ReadFile(Binary)(FAILURE) : Error opening ./crtm_coeffs/iasi_metop-a.SpcCoeff.bin
13: CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/iasi_metop-a.SpcCoeff.bin
13: READ_IASI: ***ERROR*** crtm_spccoeff_load error_status= 3
13: TERMINATE PROGRAM EXECUTION
13: Abort(71) on node 13 (rank 13 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 71) - process 13
13: In: PMI_Abort(71, application called MPI_Abort(MPI_COMM_WORLD, 71) - process 13)
With CRTM_FIX=/work2/noaa/da/cmartin/GDASApp/fix/crtm/2.4.0
the GSI runs to completion with the following caveats
Apply_AntCorr(FAILURE) : Input iFOV inconsistent with AC data
and Remove_AntCorr(FAILURE) : Input iFOV inconsistent with AC data
printoutUsing 5 OpenMP threads = 1 for profiles and
printoutgdas.t00z.gsistat
has
o-g 01 rad metop-c amsua 1379610 0 0 0.0000 0.0000 0.0000 0.0000
o-g 01 rad metop-c mhs 2859175 0 0 0.0000 0.0000 0.0000 0.0000
A run using gsi.x
built from crtm/2.4.0
has
o-g 01 rad metop-c amsua 1379610 133621 94992 0.11806E+06 0.11806E+06 1.2429 1.2429
o-g 01 rad metop-c mhs 2859175 46345 17200 3566.9 3566.9 0.20738 0.20738
As an additional test, do the following
Recopy gsi_observer.sh
and submit_gsi_observer.sh
from /work/noaa/da/eliu/git/GSI-emilyhcliu/GSI/scripts/gsi
to /work2/noaa/da/rtreadon/gdas-validation/expdir/gdas_eval_satwind_GSI
execute ./submit_gsi_observer.sh
(no change to copied file)
job 15488640 submitted. Job log file is /work2/noaa/da/rtreadon/GSI-develop2/GSIobserver/2021080100/GSIobserver.o15488640
. According to log file, CRTM coefficients were copied from /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian/
check gsi.stdout
in /work2/noaa/da/rtreadon/GSI-develop2/GSIobserver/2021080100/gsi/
GSI aborted with
SpcCoeff_ReadFile(Binary)(FAILURE) : Error reading channel data. input statement requires too much data, unit 10, file /work2/noaa/da/rtreadon/GSI-develop2/GSIobserver/2021080100/gsi/./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
READ_BUFRTOVS: ***ERROR*** crtm_spccoeff_load error_status= 3
despite file ./crtm_coeffs/amsua_metop-b.SpcCoeff.bin
existing, TERMINATE PROGRAM EXECUTION
Go back to gsi_observer.sh
and add CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Big_Endian
. Edit local copy of submit_gsi_observer.sh
to use my modified copy of gsi_observer.sh
. Execute ./submit_gsi_observer.sh
GSI aborts with
Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
Open_Binary_File(FAILURE) : Error checking ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin file byte order
SpcCoeff_ReadFile(Binary)(FAILURE) : Error opening ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin
CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin
SpcCoeff_ReadFile(Binary)(FAILURE) : Error opening ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin
CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, ./crtm_coeffs/cris-fsr_n20.SpcCoeff.bin
READ_CRIS: ***ERROR*** crtm_spccoeff_load error_status= 3
TERMINATE PROGRAM EXECUTION
Change to CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm-v2.4.0_emc.3-fix/Big_Endian
. GSI fails with same error message a 5.
Change to CRTM_FIX=/work2/noaa/da/cmartin/GDASApp/fix/crtm/2.4.0
. GSI runs to completion. Many AntCorr(Failure)
and Using 1 OpenMP threads = 1 for profiles and
messages written to gsi.stdout
Summary: g-w and stand-alone script behavior is consistent when run from my Orion account.
Hmm, ugh, I suggest we wait for @emilyhcliu before digging further as she apparently has the magic touch to get this working
Hmm, ugh, I suggest we wait for @emilyhcliu before digging further as she apparently has the magic touch to get this working
Agreed!
Issues sorted out in GSI gdas-validation test. I was using the wrong CRTM_FIX
.
gsi.x
built from @emilyhcliu feature/GSI-crtm_v2.4.1-jedi.1
runs when
export CRTM_FIX=/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix_gdasapp/fix
to config.anal
export LD_LIBRARY_PATH=/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/lib:${LD_LIBRARY_PATH}
to config.anal
gsi.x
was processing radiances based on log file printout when it seg faulted. I'm guessing that the seg fault is related to memory. The failed gdasanal was running gsi.x
with 84 tasks on 11 nodes (ppn=8) with 5 threads per task. The GSIObserver tests run gsi.x
with 200 tasks, ppn=8, threads=1. Resubmit job with GSIObserver configuration. Job immediately died with oom kill. However, in looking at the log file the problem may be a system issue and not a job issue. Will resubmit later to see what happens.
@emilyhcliu , your working copy feature/GSI-crtm_v2.4.1-jedi.1
in Orion /work/noaa/da/eliu/git/GSI-emilyhcliu/GSI
contains two modified files
modified: modulefiles/gsi_common.lua
modified: modulefiles/gsi_orion.lua
I recommend that we do not modify gsi_common.lua
. This file is used for GSI builds on all platforms. Instead of commenting out the crtm
load in gsi_common.lua
, we can unload crtm
in gsi_orion.lua
. I did so in the above mentioned test.
Look in Orion /work2/noaa/da/rtreadon/gdas-validation-test/global-workflow/sorc/gsi_enkf.fd
. This is your feature/GSI-crtm_v2.4.1-jedi.1
branch with only one modified file
modified: modulefiles/gsi_orion.lua
I retained your local modification to CRTM_FIX
and added an unload for crtm
load("gsi_common")
+unload("crtm/2.4.0")
setenv("crtm_ROOT","/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build")
setenv("crtm_VERSION","2.4.1-jedi.1")
setenv("CRTM_INC","/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/module")
setenv("CRTM_LIB","/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022/build/lib/libcrtm_static.a")
-setenv("CRTM_FIX","/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian")
+setenv("CRTM_FIX","/work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix_gdasapp/fix")
whatis("Name: crtm")
whatis("Version: 2.4.1-jedi.1")
whatis("Category: library")
@RussTreadon-NOAA I updated the branch with your suggestion and did a single-cycle run (observer only). It ran to completion without issues.
Thank you @emilyhcliu . Let me keep debugging gsi.x
inside the workflow. I can successfully run gsi.x
using your scripts. Something odd is going on in the workflow.
Debugging found differences in some fix files, gsi namelist settings, and processing of HIRS dump files.
The issue with HIRS dump files and CRTM-2.4.1-jedi.1 was noted above. So as to not touch g-w exglobal_atmos_analysis.sh
, add the following to the expdir config.anal
export B1HRS2=/dev/null
export B1HRS3=/dev/null
export B1HRS4=/dev/null
Emily's stand-alone test uses fix files from GSIFIX=/work2/noaa/da/cmartin/UFO_eval/geovals/GSI/fix
. The g-w test takes GSI fix from FIXgsi=/work2/noaa/da/rtreadon/gdas-validation-test/global-workflow/fix/gsi
. The following fix files differ between these two directories: ANAVINFO
, CONVINFO
, and OZINFO
. Given this, add the following to expdir config.anal
export GSIFIX=/work2/noaa/da/cmartin/UFO_eval/geovals/GSI/fix
export ANAVINFO=$GSIFIX/global_anavinfo.l127.txt
export CONVINFO=$GSIFIX/global_convinfo.txt
export OZINFO=$GSIFIX/global_ozinfo.txt
Add the following to the expdir config.anal
to address gsi namelist differences
export SETUP="gpstop=55.,${SETUP:-}"
export imp_physics=11
export cao_check=".false."
export ta2tb=".false."
export GRIDOPTS="nlayers(63)=1,nlayers(64)=1,${GRIDOPTS:-}"
export NST_GSI=0
With the above additions to expdir config.anl
, the 2021080100 gdasanal job successfully ran to completion.
@emilyhcliu explained that the HIRS change is required when using CRTM-2.4.1-jedi.1. What about the fix file and gsi namelist changes? Which of these changes do we need or want to include in gdas-validation?
@emilyhcliu , @ADCollard , & @CoryMartin-NOAA
Additional differences between run using Emily's stand-alone script and the gdas-validation g-w gdasanal job:
Rcov
section, the stand-alone run does not copy the Rcov
files to the run directory. The gdas-validation copies the Rcov
files to the run directory and gsi.x
uses them. thin4d=.true.
in the stand-alone job. thin4d=.false.
in the gdasanal job. This changes cpen for several observation typesdsfcalc=0
for all obs types. The gdasanal run sets dsfcalc=1
for numerouls radiance datasets. If someone can point me at the GSI configuration we want to use for gdas-validation, I can create a JEDI-T2O branch in which config.anal
is updated to replicate the target configuration.
We are using the focus cycle (2021080100) for our evaluation and should be using the focus cycle for the code sprint. So, people can use the 2021080100 geoval and obs files from the UFO evaluation as a reference. These files contain GSI output (e.g. HofX and some derived variables).
The GSI workflow for the code sprint provides a way for people to re-run the focus cycle the GFS processing from prep step to observer part of the first outer loop. And people can configure it to run with different configurations or cycle. For example, for me (working on radiances), I can change the all-sky related namelist and configuration files (anavinfo, satinfo, ...etc) for the latest update (ta2tb is true and use updated anavinfo... etc) in the operational system.
@RussTreadon-NOAA I think we should keep namelist setting and configuration as the same as we run the 2021080100. For the code sprint, we are not seeking bit-identical result since we will be checking end-to-end comparion between GDAS and JEDI. They are some fundamental difference between the two in current status. So, bottom line, we need to have the following setup in the GSI workflow:
nvqc = .false.
FGAT should be off.
We should also turn off Hilbert curve for aircraft data. @CoryMartin-NOAA and @ADCollard found that the switch for Hilbert curve is hardwired in GSI. So, we need to turn it off in the code, not from the script. @CoryMartin-NOAA and @ADCollard, could you give guidence for this?
@emilyhcliu @RussTreadon-NOAA The Hilbert Curve code starts at line 3007 of read_prepbufr.f90.
! the following is gettin the types which will be applied hilbert curve to
! estimate the density
if(obstype == 'uv') then
vmin=-10.00_r_kind
vmax=18000.00_r_kind
nor=0
...
The entire if-block statrting if(obstype == 'uv') then
should be commented out for now.
Namelist OBSQC
contains logical variable hilbert_curve
. gsi.x
defaults this variable to .false.
. What if we allow gsi.x
to default hilbert_curve
to .false.
and change line 3010 to read
! the following is gettin the types which will be applied hilbert curve to
! estimate the density
if(obstype == 'uv' .and. hilbert_curve) then
Would this suffice?
Namelist
OBSQC
contains logical variablehilbert_curve
.gsi.x
defaults this variable to.false.
. What if we allowgsi.x
to defaulthilbert_curve
to.false.
and change line 3010 to read! the following is gettin the types which will be applied hilbert curve to ! estimate the density if(obstype == 'uv' .and. hilbert_curve) then
Would this suffice?
Not only would it suffice, but we should probably add this to the develop branch....
One question, one suggestion, and one request
gsi.x
to reproduce the operational GFS v16.3.x or NOAA-EMC/GSI develop
(looking ahead to GFS v17)?gsi.x
from the forked GSI-crtm_v2.4.1-jedi.1. It seems preferable to merge this forked branch into a NOAA-EMC/GSI gdas-validation branch, tag it, and update JEDI-T2O to clone and build the tag. What do you think?Namelist
OBSQC
contains logical variablehilbert_curve
.gsi.x
defaults this variable to.false.
. What if we allowgsi.x
to defaulthilbert_curve
to.false.
and change line 3010 to read! the following is gettin the types which will be applied hilbert curve to ! estimate the density if(obstype == 'uv' .and. hilbert_curve) then
Would this suffice?
Not only would it suffice, but we should probably add this to the develop branch....
OK, we can open an issue and get it into develop
Namelist
OBSQC
contains logical variablehilbert_curve
.gsi.x
defaults this variable to.false.
. What if we allowgsi.x
to defaulthilbert_curve
to.false.
and change line 3010 to read! the following is gettin the types which will be applied hilbert curve to ! estimate the density if(obstype == 'uv' .and. hilbert_curve) then
Would this suffice?
Not only would it suffice, but we should probably add this to the develop branch....
OK, we can open an issue and get it into
develop
I tested this change in GSI tag gfsda.v16.3.10
using the operational 2023103100 gdas cycle. Adding hilbert_curve
to the logical test increased the initial (obs-ges) uv penalty by 39.7%. With only if (obstype == 'uv') then
the o-g uv penalty is 0.252290097411315393E+06
. After adding .and. hilbert_curve
to the logical test the uv penalty increased to 0.352427974894839805E+06
.
Operations run the global GSI with logical hilbert_curve=.false.
. Thus, by adding hilbert_curve
to the logical test, the uv block in question is not entered.
I'm confused. The original code in read_prepbufr.f90
reads
! the following is gettin the types which will be applied hilbert curve to
! estimate the density
if(obstype == 'uv') then
The comment in the code along with @ADCollard 's guidance suggest that this block should only be executed when hilbert_curve=.true.
I assumed this is how we run gsi.x
in the GFS. This isn't the case. We execute this block in operations for all uv
observations processed by read_prepbufr.f90
. Is this what we want to happen?
Seems my understanding of logical hilbert_curve
is not correct. gsimod.F90
contains the comment
! hilbert_curve - option for hilbert-curve based cross-validation. works only
! with twodvar_regional=.true.
Logical hilbert_curve
is for cross-validation in 2DVar regional mode. It's not a variable for global GSI runs. Given this, my suggestion to add hilbert_curve
to the logical in read_prebufr.f90
is wrong. We must do as @ADCollard said. The entire uv
block needs to be commented out. Alternatively, we could add a new logical to bypass the block. Is there any benefit from adding a new logical apart from ease during gdas-validation?
The GDAS-validation sprint begins Monday, 11/13. Next week (6-10 Nov) is a short work week (Friday, 11/10 is the Veterans day holiday). I'm not available 11/10 through 11/12.
Work remains to prepare GDAS-validation
for easy use by developers. Some of this work involves others (e.g., how to lower gsi.x
wall times on Orion following the 10/23 PM). Other work is ours.
Here's a partial listing of our work items:
fv3jedi_var.x
to use the same CRTM coefficients as gsi.x
. Seems we should also build fv3jedi_var.x
with same CRTM module as gsi.x
, right?What other pre-sprint work should be added to the above list?
Given that Orion is slow, do we move to Hera? I think the savings in runtime will be offset by the longer job queues though...
Thanks @RussTreadon-NOAA I think this is a good list. The biggest one is item 3 , @ADCollard and @emilyhcliu what all should we turn off in GSI? I know we need to turn off FGAT, the time thinning error inflation, VarQC. But what else?
I'm wrestling with the move to Hera, too. The 11/13 sprint is step one of gdas validation in that it will focus on the observer (ufo), right? If true, we will have a step two gdas validation at a later date where we compare gsi.x & fv3jedi_var.x minimization (solver) including varbc. Not all this work will be done on Orion (or Hercules). So long term I think we want to extend setup_workspace.sh
to Hera.
The gdas validation sprint isn't the only thing DAD staff are working on. Are we using using Hera for GFS v17 tests? GFS v17 includes JEDI based marine, land, and aerosol DA. Maybe we reserve Hera for GFS v17 & related JEDI work and keep gdas validation on Orion for the time being.
Thoughts? Comments?
I think the extension to Hera is fairly straightforward, the only real sticking point would be mirroring the input data to Hera from Orion. We would need to stage FMS restarts, Gaussian history files, bias correction files, (and observations should be in the glopara space already). This isn't difficult, but it does take up space. Hera space is at a premium compared to Orion.
I also agree that Hera is probably better spent on the GFS T2O specific tasks and use Orion for this lower readiness level testing. GSI may be running slow on Orion, but it still runs.
Agreed. The GSI still runs on Orion ... it's just slow. One suggestion from the Orion helpdesk is to recompile the stack we use to build GSI. I asked in g-w issue #1996 about getting this done. GSI slowness on Orion will be addressed. We also have the possibility of using Hercules in the future, though it seems some executables also run slow on Hercules.
Conduct the following test on Orion.
feature/GSI-crtm_v2.4.1-jedi.1
from https://github.com/emilyhcliu/GSI.gitmodulefiles/gsi_orion.lua
to replicate as much as possible GDASApp modulefiles/GDAS/orion.lua
. Look at /work2/noaa/da/rtreadon/gdas-validation-test/global-workflow/sorc/gsi_enkf.fd_jedi/modulefiles/gsi_orion.lua
to see the modified gsi_orion.lua
.src/gsi/CMakeLists.txt
and src/gsi/read_prepbufr.f90
ush/build.sh
. Both gsi.x
and enkf.x
were builtGSIDIR
in submit_gsi_observer.sh
to point at the above mentioned gsi_enkf.fd_jedi
LD_LIBRARY_PATH
in gsi_observer.sh
which pointed at Emily's crtm_v2.4.1-jedi.1./submit_gsi_observer.sh
The job ran to completion. The run directory is /work2/noaa/da/rtreadon/ufoeval/GSIobserver/2021080100/gsi_spack_build_cory_crtmfix
. Also run with Emily's original configuration. The run directory for this run is /work2/noaa/da/rtreadon/ufoeval/GSIobserver/2021080100/gsi_hpc_build_emily_crtm
The fort.2*
stats are identical between the two runs with the exception of fort.207
. The total radiance penalties differ in the 14th printed digit. There are no differences in the counts for assimilated radiance observations.
With the above changes gsi.x
is built using the same spack-stack and crtm library as fv3jedi_var.x
. The CRTM_FIX
used for the spack-stack gsi.x
run above was /work2/noaa/da/cmartin/GDASApp/fix/crtm/2.4.0
. This is the same CRTM_FIX
used by fv3jedi_var.x
when run by g-w.
I am also using feature/GSI-crtm_v2.4.1-jedi.1
, crtm_v2.4.1-jedi.1
, and the CRTM coefficients from GDASApp: /work2/noaa/da/cmartin/GDASApp/fix/crtm/2.4.0
to generate test data for UFO Evaluation.
Great! Shall we update gsi_orion.lua
in a gdas-validation specific branch for the purpose of the upcoming sprint?
The following branches have been created in the following repositories for possible use in the GDAS validation sprint:
JEDI-T2O:feature/gdas-validation
. To see differences with respect to develop
, click here.
config.anal
to configure GSI for GDAS validationconfig.atmanl
to be consistent with recent changes to g-w develop
config.prep
to be consistent with recent changes to g-w develop
config.resources
to be consistent with recent changes to g-w develop
config_jedi.yaml
to be consistent with order used in config_gsi.yaml
setup_workspace
to checkout feature/gdas-validation
, add error trapping to GSI and GDASApp buildsGSI:/feature/gdas-validation
. To see differences with respect to develop
, click here.
crtm_v2.4.1-jedi.1
It is not clear if all the changes in JEDI-T2O branch feature/gdas-validation
need to be present for GDAS validation. How do we want gsi.x
configured for GSI validation? The current settings in config.anal
may not be correct or complete.
Two additional considerations
feature/gdas-validation
branch for GDAS validation.
feature/gdas-validation
branch. For example, g-w exglobal_atmos_analysis.sh
copies (links) CRTM fix file CloudCoeff.GFDLFV3.-109z-1.bin
to local file CloudCoeff.bin
. In contrast, g-w parm/gdas/atm_crtm_coeff.yaml
copies CRTM fix file CloudCoeff.bin
to local file CloudCoeff.bin
. These are different cloud coefficient files. If we want the same cloud coefficient file used by gsi.x
and fv3jedi_var.x
, we need to edit g-w exglobal_atmos_anlaysis.sh
or atm_crtm_coeff.yaml
@emilyhcliu @RussTreadon-NOAA The Hilbert Curve code starts at line 3007 of read_prepbufr.f90.
! the following is gettin the types which will be applied hilbert curve to ! estimate the density if(obstype == 'uv') then vmin=-10.00_r_kind vmax=18000.00_r_kind nor=0 ...
The entire if-block statrting
if(obstype == 'uv') then
should be commented out for now.
Lines 3007 to 3165 have been commented out in the snapshot of read_prepbufr.f90
in feature/gdas-validation
. Done at 7ef942c3
.
For the upcoming end-to-end ode sprint, we would like to have a GSI branch with a configuration to use CRTM 2.4.1-jedi.1, which is consistent with the CRTM used in GDASApp.
GSI The GSI branch created for this PR is in the following GSI forked repository: GSI-crtm_v2.4.1-jedi.1
These are changes to GSI.
CRTM The CRTM version 2.4.1-jedi.1 is built on ORION in the following location: /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-intel2022
The associated coefficients files are in the following location: /work/noaa/da/eliu/JEDI-GDAS/crtm_v2.4.1-jedi.1-fix/Little_Endian
There are three sets of CRTM coefficients we can:
I tested three sets, and they all worked fine.
The first one contains the coefficients we use in the operation + N21 coefficients The second one is the one linked to GDASApp from the run_ufo_hofx_test.sh The third one is the coefficients packed with crtm_v2.4.1-jedi.1 tag.
The first set of coefficients is good for our purpose for UFO evaluation with GSI.