JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
9 stars 2 forks source link

Set a consistent, non-zero tolerance for comparing floating point values #1392

Closed srherbener closed 10 months ago

srherbener commented 10 months ago

Description

This PR changes the tolerance value for comparing floating point numbers to a consistent, non-zero value. This is coded in a single variable, IODA_CONV_COMP_TOL, and used everywhere the iodaconv_comp.sh script is used. Currently IODA_CONV_COMP_TOL is set to 1.0e-3 (relative tolerance of 0.001 percent).

Issue(s) addressed

Partially addresses #1378

Dependencies

List the other PRs that this PR is dependent on: None

Impact

Expected impact on downstream repositories: None - no functional changes

Checklist

srherbener commented 10 months ago

@PatNichols thanks for investigating into this issue. I am seeing test failures on Mac M1 (arm64) of which I'm working on fixing. See https://github.com/JCSDA/spack-stack/issues/792 and #1378.

When I run on my Mac (arm64) using this feature branch, and a tolerance of 1e-3, I get the following errors:

98% tests passed, 6 tests failed out of 326

Label Time Summary:
executable           =   0.57 sec*proc (1 test)
iodaconv             = 133.53 sec*proc (325 tests)
iodaconv_validate    =  40.19 sec*proc (174 tests)
script               = 132.96 sec*proc (324 tests)

Total Test time (real) = 133.99 sec

The following tests FAILED:
    705 - test_iodaconv_prepbufr_ncep_api_adpsfc2ioda (Failed)
    706 - test_iodaconv_prepbufr_ncep_api_sfcshp2ioda (Failed)
    725 - test_iodaconv_bufr_ncep_prepbufr_adpupa_api (Failed)
    748 - test_iodaconv_prepbufr_conv (Failed)
    749 - test_iodaconv_mhs_conv (Failed)
    750 - test_iodaconv_amsua_conv (Failed)

These are due to other issues besides the tolerance setting.

When I try a tolerance value of 5.0e-5, I see the following additional failures:

> 718:test_iodaconv_bufr_ncep_mtiasi
> 719:test_iodaconv_bufr_ncep_atms

Here is the nccmp output with these two tests:

PRE-MAIN-INFO BufrParser: Parsing file ./testinput/gdas.t12z.mtiasi.tm00.bufr_d
PRE-MAIN-INFO Executing Queries
PRE-MAIN-INFO Building Bufr Data
PRE-MAIN-INFO Exporting Data
PRE-MAIN-INFO Finished [0.328s]
PRE-MAIN-WARNING   Skipped category metop-a
DIFFER : VARIABLE : sensorViewAngle : POSITION : [146] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [147] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [148] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [149] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [176] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [177] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [178] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [179] : VALUES : -1.029 <> -1.029 : PERCENT : 9.26797e-05
Variable        Group     Count          Sum      AbsSum          Min          Max Range         Mean StdDev
sensorViewAngle /MetaData     8 -7.62939e-06 7.62939e-06 -9.53674e-07 -9.53674e-07     0 -9.53674e-07      0
<end of output>
Test time =   0.70 sec
...
PRE-MAIN-INFO BufrParser: Parsing file ./testinput/gdas.t00z.atms.tm00.bufr_d
PRE-MAIN-INFO Executing Queries
PRE-MAIN-INFO Building Bufr Data
PRE-MAIN-INFO Exporting Data
PRE-MAIN-INFO Finished [0.118s]
DIFFER : VARIABLE : sensorViewAngle : POSITION : [41] : VALUES : -0.554998 <> -0.554996 : PERCENT : 0.000236272
DIFFER : VARIABLE : sensorViewAngle : POSITION : [42] : VALUES : 0.555002 <> 0.555 : PERCENT : 0.000343665
DIFFER : VARIABLE : sensorViewAngle : POSITION : [43] : VALUES : 1.665 <> 1.665 : PERCENT : 7.87568e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [137] : VALUES : -0.554998 <> -0.554996 : PERCENT : 0.000236272
DIFFER : VARIABLE : sensorViewAngle : POSITION : [138] : VALUES : 0.555002 <> 0.555 : PERCENT : 0.000343665
DIFFER : VARIABLE : sensorViewAngle : POSITION : [139] : VALUES : 1.665 <> 1.665 : PERCENT : 7.87568e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [237] : VALUES : -0.554998 <> -0.554996 : PERCENT : 0.000236272
DIFFER : VARIABLE : sensorViewAngle : POSITION : [238] : VALUES : 0.555002 <> 0.555 : PERCENT : 0.000343665
DIFFER : VARIABLE : sensorViewAngle : POSITION : [239] : VALUES : 1.665 <> 1.665 : PERCENT : 7.87568e-05
DIFFER : VARIABLE : sensorViewAngle : POSITION : [333] : VALUES : -0.554998 <> -0.554996 : PERCENT : 0.000236272
DIFFER : VARIABLE : sensorViewAngle : POSITION : [334] : VALUES : 0.555002 <> 0.555 : PERCENT : 0.000343665
DIFFER : VARIABLE : sensorViewAngle : POSITION : [335] : VALUES : 1.665 <> 1.665 : PERCENT : 7.87568e-05
...

Looking at these two lists, perhaps a value of 5e-4 would work, so I tried that. The two tests that failed using 5e-5 pass, but a new test failure shows up (test_iodaconv_obserror):

"test_iodaconv_obserror" start time: Oct 10 15:04 MDT
Output:
----------------------------------------------------------
At line 36 of file /Users/stephenh/projects/TECH_DEBT/ioda-bundle/iodaconv/src/obserror/obserror2ioda.f90
Fortran runtime error: Bad unit number in OPEN statement

Error termination. Backtrace:
#0  0x10155f187
#1  0x10155fd37
#2  0x101560613
#3  0x10164676f
#4  0x100b9317f
#5  0x100b93a0b
<end of output>
Test time =   0.12 sec
----------------------------------------------------------
Test Failed.
"test_iodaconv_obserror" end time: Oct 10 15:04 MDT
"test_iodaconv_obserror" time elapsed: 00:00:00
----------------------------------------------------------

which seems unrelated to the tolerance value. Note the test_iodaconv_obserror failure is intermittent - if you run it a bunch of times it passes sometimes and fails other times.

So it looks like 5e-4 instead of 1e-3 still works. I'll update the PR with that value.

PatNichols commented 10 months ago

@srherbener @BenjaminRuston @CoryMartin-NOAA Do we meed to include input from in kinds on this PR?

srherbener commented 10 months ago

I think it's a good idea to get a review from one of the operational centers. In the spirit of keeping things moving along I think it would be sufficient to get @CoryMartin-NOAA's review and call it good at that point. Does this sound agreeable? Thanks!