GEOS-ESM / swell

Workflow system for coupled data assimilation applications
https://geos-esm.github.io/swell/
Apache License 2.0
14 stars 4 forks source link

YAML Configuration | ATMS #63

Closed danholdaway closed 1 year ago

danholdaway commented 2 years ago

Agreement with GSI using GeoVaLs in UFO (Hofx no bias correction)

Agreement with GSI using GeoVaLs in UFO (Bias correction)

Agreement with GSI using GeoVaLs in UFO (Quality control)

Agreement with GSI using GeoVaLs in UFO (Hofx with bias correction)

Agreement with GSI using Swell

Waiting on JEDI PRs:

Current issues:

gmao-jjin3 commented 2 years ago

Here is an old pull request fixing the segment fault in ATMS emissivity over snow surface. https://github.com/JCSDA-internal/crtm-old/pull/74

gmao-jjin3 commented 2 years ago

Here is the issue https://github.com/JCSDA-internal/ufo-old/issues/1100

gmao-jjin3 commented 2 years ago

The fix is in the UFO/CRTM and my previous test also passed with this fix in Sep 2020.

gmao-jjin3 commented 2 years ago

Emily reported an issue about ATMS sea ice emissivity in Jan 2021. https://github.com/JCSDA-internal/ufo/issues/621

gmao-yzhu commented 2 years ago

We encountered an issue with error message "Caught SIGFPE: floating-point invalid operation" when calculating simulated ATMS radiances.

Similar to what Emily did, when I added a bunch of print statements, it ran successfully to the end for ATMS_n20 together with bias correction. the results are below: 27: Test : Vector difference between reference and computed: atms_n20 nobs= 232417 Min=-7.5294327245956083e-05, Max=7.5534442942171154e-05, RMS=9.6457825732815980e-06

However, it still failed for atms_npp at the first step -- radiance simulation step with error message "floating-point invalid operation", the detailed error message is copy-and-paste below: 26: Caught SIGFPE: floating-point invalid operation 26: 0# sigfpe_handler(int, siginfo_t, void) in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/liboops.so 26: 1# 0x00002AAAB7DC6C10 in /lib64/libpthread.so.0 26: 2# 0x00000000004E8E03 in /discover/nobackup/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/test_ObsOperator.x 26: 3# __svml_log2_l9 in /discover/nobackup/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/test_ObsOperator.x 26: 4# nesdis_atms_seaice_module_mp_atms_seaice_bytbtsd in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 5# nesdis_atms_seaice_module_mp_nesdis_atmsseaice in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 6# crtm_mw_ice_sfcoptics_mp_compute_mw_icesfcoptics in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 7# crtm_sfcoptics_mp_crtm_computesfcoptics in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 8# common_rtsolution_mp_assign_commoninput in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 9# crtm_rtsolution_mp_crtm_computertsolution in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 10# 0x00002AAAB8880695 in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so 26: 11# crtm_forward_module_mp_crtmforward in /gpfsm/dnb32/yzhu/JEDI/fv3-bundle/build-intel-impi-release-fv3/bin/../lib64/libcrtm.so ``

Reported the issue to Ben Johnson at JCSDA.

gmao-jjin3 commented 2 years ago

@gmao-yzhu New testing files are in /discover/nobackup/jjin3/jedi/For_others/x0044.jj_20220619/

danholdaway commented 2 years ago

@gmao-yzhu the crash for NPP seems to be caused by missing data in the files. There are several locations where not all channels have a valid brightness temperature. For example, here are all the brightness temperatures at the crashing profile, as written out by the UFO (before the CRTM is called):

 -3.334767057904812E+038 -3.334767057904812E+038 -3.334767057904812E+038
 -3.334767057904812E+038 -3.334767057904812E+038 -3.334767057904812E+038
 -3.334767057904812E+038 -3.334767057904812E+038 -3.334767057904812E+038
 -3.334767057904812E+038 -3.334767057904812E+038 -3.334767057904812E+038
 -3.334767057904812E+038 -3.334767057904812E+038 -3.334767057904812E+038
 -3.334767057904812E+038   239.110000610352        254.279998779297     
   251.399993896484        247.970001220703        243.190002441406     
   237.839996337891

Then here there is a function that takes a log of these very large negative values. I strongly suspect that this is a big problem. It's not a bug, which makes me wonder why it works with the extra print statements you put in. So there might be other issues but this would be the first thing to resolve.

It's not obvious to me how to solve this. Perhaps in the GSI these missing values are represented by very large positive values which prevents the log from failing and then the points are filtered out later. Adjusting the value used for missing data might have unpredictable ramifications. This might need to be discussed with the JCSDA folks in order to come up with a workable solution.

danholdaway commented 2 years ago

You can run with this branch: https://github.com/JCSDA-internal/crtm/pull/371

gmao-yzhu commented 2 years ago

In addition to the AMSU-A like channels, ATMS has 6 more higher frequency channels including water vapor channels. I have made QC code changes for these higher frequencies and they are already merged into fv3-bundle.

A bug is found and fixed in fv3-bundle hydrometeor code for ATMS. Will submit a PR to fv3-bundle.

With the fix of the bug, consistent UFO results are obtained against GSI results for ATMS NOAA20. The comparisons of OmF with bias correction, histogram of OmF for data passing QC, final observation errors are shown below.
atms_n20 UFO_ObsValue-hofx ObsValue-GsiHofXBc_BC_QC ch17 ps atms_n20 UFO_Obs_Number O-F_BC_QC ch17 ps atms_n20 UFO_EffectiveError GsiFinalObsError_QC ch17 ps

gmao-yzhu commented 2 years ago

JCSDA-internal/crtm#371

Thanks Dan for the crtm branch. I will try it for atms npp.

danholdaway commented 2 years ago

Great that everything works now for N20!

gmao-jjin3 commented 2 years ago

@gmao-yzhu Can you check those observational errors in GSI and UFO as well? Just a double-check, even though the histograms suggest they match.

gmao-yzhu commented 2 years ago

@gmao_jjin3 Yes, I had checked ATMS N20 observational errors in GSI and UFO, please see the above third figure. They matched very well.

For ATMS NPP, I tested it in UFO with Dan's CRTM branch due to the CRTM sea ice issue, while the GSI results were still created using the original CRTM. The results looked good in terms of histogram of OmF passing QC, OmF with bias correction, and final observation errors. The results for channel 5 and 18 are given below. For channel 5, the count of data passing QC in UFO (5068) is one less than that in GSI, but I think it is ok, given that different CRTM was used. Channel 18 has the same number of data passing QC. atms_npp UFO_Obs_Number O-F_BC_QC ch05 atms_npp UFO_Obs_Number O-F_BC_QC ch18 atms_npp UFO_ObsValue-hofx ObsValue-GsiHofXBc_BC_QC ch05 atms_npp UFO_ObsValue-hofx ObsValue-GsiHofXBc_BC_QC ch18 atms_npp UFO_EffectiveError GsiFinalObsError_QC ch05 atms_npp UFO_EffectiveError GsiFinalObsError_QC ch18

So the UFO tests of ATMS are completed now.

danholdaway commented 2 years ago

Great! Glad that the crtm changes are working out. Hopefully they will be merged soon.

danholdaway commented 2 years ago

@gmao-yzhu: the CRTM PR was merged so this should be working with the develop version of JEDI now. Tomorrow I will work on a new build of JEDI and then update the swell tag to point to a new build.