Closed emilyhcliu closed 7 months ago
@emilyhcliu , thank you for reporting this error. Would you please add to this issue the paths to your EXPDIR
and COMROT
? It's hard to debug without the log files and config files. Thank you.
@emilyhcliu , thank you for reporting this error. Would you please add to this issue the paths to your
EXPDIR
andCOMROT
? It's hard to debug without the log files and config files. Thank you.
EXPDIR: /work2/noaa/da/eliu/gdas-validation/expdir/gdas_eval_iasi_JEDI COMROT: /work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/
The log file: /work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/logs/2021080100/gdasatmanlrun.log
While /work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/logs/2021080100/gdasatmanlrun.log
is for an iasi run, the run directory it points at, /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_iasi_JEDI/gdasatmanl_00
, is missing the fv3jedi yaml and only contains an atms obs file. Seems you are working on atms.
While
/work2/noaa/da/eliu/gdas-validation/comrot/gdas_eval_iasi_JEDI/logs/2021080100/gdasatmanlrun.log
is for an iasi run, the run directory it points at,/work/noaa/stmp/eliu/RUNDIRS/gdas_eval_iasi_JEDI/gdasatmanl_00
, is missing the fv3jedi yaml and only contains an atms obs file. Seems you are working on atms.
@RussTreadon-NOAA I am re-running the IASI case. /work/noaa/stmp/eliu/RUNDIRS/gdas_eval_iasi_JEDI/gdasatmanl_00
will be updated soon.
I am about to open a draft PR for IASI to add the YAML files.
@RussTreadon-NOAA I re-ran the iasi case but ran into OSError: [Errno 122] Disk quota exceeded
issue.
The STMP is full. I cleaned up my STMP. Waiting for others to clean up STMP. Will try to run later.
@emilyhcliu , g-w assumes that we fully pack compute nodes. This is a known limitation of the g-w xml generator.
When we specify 1 thread for atmanlrun, g-w sets the number of tasks per node on Orion to 40. This is too many tasks per node when processing iasi. I modified config.resources
and hand edited my xml file to run fv3jedi_var.x
with 1 task per node. fv3jedi_var.x
ran up to an ioda exception
0: QC iasi_metop-a brightnessTemperature_138: 10120 passed out of 323980 observations.
24: Exception: Reason: An exception occurred inside ioda while opening a variable.
24: name: MetaData/sensorCentralWavenumber
24: source_column: 0
24: source_filename: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp
24: source_function: ioda::Variable ioda::detail::Has_Variables_Base::open(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &) const
24: source_line: 108
24:
I
@emilyhcliu , g-w assumes that we fully pack compute nodes. This is a known limitation of the g-w xml generator. When we specify 1 thread for atmanlrun, g-w sets the number of tasks per node on Orion to 40. This is too many tasks per node when processing iasi. I modified
config.resources
and hand edited my xml file to runfv3jedi_var.x
with 1 task per node.fv3jedi_var.x
ran up to an ioda exception0: QC iasi_metop-a brightnessTemperature_138: 10120 passed out of 323980 observations. 24: Exception: Reason: An exception occurred inside ioda while opening a variable. 24: name: MetaData/sensorCentralWavenumber 24: source_column: 0 24: source_filename: /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp 24: source_function: ioda::Variable ioda::detail::Has_Variables_Base::open(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> &) const 24: source_line: 108 24:
The IASI yaml is outdated in the current gdas-validation. I am cooking an IASI PR and add the latest update.
Where may I find the updated iasi yaml? I'd like to test it in my gdas-validation workspace.
Where may I find the updated iasi yaml? I'd like to test it in my gdas-validation workspace.
They can be found in the following PR https://github.com/NOAA-EMC/GDASApp/pull/769
A paring UFO PR (feature/satrad) is required to test IASI.
Thank you @emilyhcliu . Testing underway.
Thanks so much! @RussTreadon-NOAA
Able to process metop-a and metop-b iasi when fv3jedi_var.x run with
@RussTreadon-NOAA and @CoryMartin-NOAA I ran the end-to-end testing for IASI and got the Out-Of-Memory (OOM) error message:
The current resource configuration for
atmanlrun
process is the following:Here is the node configuration in XML:
I tried a few things:
10:ppm=40:tpp=1
to40:ppm=40:tpp=1
----> still got OOMexport memory_atmanrun="4000GB"
----> still got OOMexport memory_atmanlrun="0"
(use max) ----> still got OOMDo you have suggestions for resolving the OOM problem?
Note: I have the IASI YAML and Python script ready for end-to-end in the workflow.
The test in the workflow has the OOM problem described above. So, I tested the YAML and pthon script in a separate configuration using fv3 nomodel executable. The run completed successfully. The layout is 5x5x6