JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
8 stars 2 forks source link

YAML, input prepBUFR, and output IODA file for [3D]-RTMA ADPUPA prepbufr #1453

Closed PraveenKumar-NOAA closed 4 months ago

PraveenKumar-NOAA commented 5 months ago

Description

This PR adds an ioda-converters YAML for the [3D]-RTMA ADPUPA prepbufr data.

The following files are new/ modified:

The ioda-validation test is performed and following results were obtained:

Final results: errors: 0 warnings: 0

The validation of the output ioda obs is also performed in: https://github.com/JCSDA-internal/ioda-converters/issues/1451

Issue(s) addressed

Resolves #<1450> partially.

Dependencies

None

Impact

None

Checklist

fcvdb commented 5 months ago

@PraveenKumar-NOAA The Netcdf output file size is a little bit large (2Mb), is there a way you could subsample the input BUFR file to reduce the size of the output? Thanks.

PraveenKumar-NOAA commented 5 months ago

@fcvdb @PatNichols I fixed the file sizes and also ran the unit test, thanks.

PraveenKumar-NOAA commented 5 months ago

$ ctest -R test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa Test project /scratch1/NCEPDEV/da/Praveen.Singh/JEDI/Development-Tests/ioda-bundle_prepbufr/build Start 654: test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa 1/1 Test #654: test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa ... Passed 0.99 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary: iodaconv = 0.99 secproc (1 test) script = 0.99 secproc (1 test)

Total Test time (real) = 1.13 sec

BenjaminRuston commented 5 months ago

@CoryMartin-NOAA and @emilyhcliu and @PraveenKumar-NOAA this appears to have pressure in the MetaData as expected

this is what I would call normal

I tried decoding a prepBUFR directly and this segfaults, but it runs fine when I use the ADPUPA from a split_by_subset from a prepBUFR. Assume this is expected behavior?

>>> f['MetaData'].keys()
<KeysViewHDF5 ['dateTime', 'height', 'latitude', 'longitude', 'pressure', 'stationElevation', 'stationIdentification']>

and quick dump of some information from a file from a split_by_subset from a file like prepbufr.gdas.20220216.t00z.nr.48h this are available via the NCAR RDA.

file: 20220215T210000Z_PT6H_raob_nesdis_ADPUPA_prepbufr.nc4
range(/MetaData/dateTime): [2022-02-15 20:31:48, 2022-02-16 02:54:21]
range(/MetaData/height): [-260.0, 36534.0]
range(/MetaData/latitude): [-90.0, 82.89002227783203]
range(/MetaData/longitude): [0.0, 359.99365234375]
range(/MetaData/pressure): [310.0, 104200.0]
range(/MetaData/stationElevation): [-22.0, 4508.0]
range(/MetaData/stationIdentification): [02527, ZVQEQCM]
range(/ObsError/airTemperature): [0.800000011920929, 1.5]
range(/ObsError/pressure): [110.0, 120.00000762939453]
range(/ObsError/relativeHumidity): [2.0, 2.0]
range(/ObsError/windEastward): [1.399999976158142, 3.200000047683716]
range(/ObsValue/airTemperature): [185.84999084472656, 309.3500061035156]
range(/ObsValue/dewPointTemperature): [163.25, 300.1499938964844]
range(/ObsValue/specificHumidity): [0.0, 0.022273000329732895]
range(/ObsValue/virtualTemperature): [185.84999084472656, 309.3500061035156]
range(/ObsValue/windEastward): [-77.19999694824219, 100.30000305175781]
range(/ObsValue/windNorthward): [-119.5, 133.6999969482422]
range(/QualityMarker/airTemperature): [1, 15]
range(/QualityMarker/height): [1, 15]
range(/QualityMarker/pressure): [1, 15]
range(/QualityMarker/specificHumidity): [2, 15]
range(/QualityMarker/windEastward): [2, 15]

noted that the reference time is also a requirement to change in the yaml

BenjaminRuston commented 5 months ago

was able to plot the resulting airTemperature from the IODA observation file: image

PraveenKumar-NOAA commented 5 months ago

@BenjaminRuston @PatNichols is there anything I need to take care for this PR?

PatNichols commented 5 months ago

@PraveenKumar-NOAA It's failing the relevant ctest after merging you last one. Looking into it.

BenjaminRuston commented 5 months ago

@PraveenKumar-NOAA , @CoryMartin-NOAA and @rmclaren not sure what happened with this PR after:

https://github.com/JCSDA-internal/ioda-converters/pull/1454

was merged. Please let us know if you see anything

BenjaminRuston commented 5 months ago

@PraveenKumar-NOAA I did have to resolve some conflicts in the branch after the merge or PR1454 very well may have not done that correctly

PatNichols commented 5 months ago

@PraveenKumar-NOAA @BenjaminRuston it's failing test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa Lots of error that have the form (zero or near zero ) != fill value for float

BenjaminRuston commented 5 months ago

can someone also explain to me what 3D RTMA is besides regional?

why would a separate decoder be needed versus a standard ADPUPA from split_by_subset from a prepBUFR file?

CoryMartin-NOAA commented 5 months ago

The RTMA is our analysis of record. It is used for verification, not for forecasting. However, I don't know why they have ADPUPA specific things compared to RRFS.

PatNichols commented 5 months ago

@PraveenKumar-NOAA I am still seeing a single ctest failure: Start 128: test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa

658 | 128/334 Test #128: test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa ....................................***Failed 0.36 sec

PraveenKumar-NOAA commented 5 months ago

@PatNichols I don't know why it is failing. Let us talk tomorrow, if you are available.

PatNichols commented 5 months ago

@PraveenKumar-NOAA These are the errors:PRE-MAIN-INFO BufrParser: Parsing file testinput/rtma_ADPUPA.prepbufr PRE-MAIN-INFO Executing Queries PRE-MAIN-INFO Building Bufr Data PRE-MAIN-INFO Exporting Data PRE-MAIN-INFO Finished [0.039s] DIFFER : VARIABLE : airTemperature : POSITION : [139] : VALUES : 0 <> 3.40282e+38 DIFFER : VARIABLE : airTemperature : POSITION : [140] : VALUES : 0 <> 3.40282e+38 DIFFER : VARIABLE : airTemperature : POSITION : [141] : VALUES : 0 <> 3.40282e+38 DIFFER : VARIABLE : airTemperature : POSITION : [142] : VALUES : 1.80768e-43 <> 3.40282e+38 DIFFER : VARIABLE : airTemperature : POSITION : [143] : VALUES : 0 <> 3.40282e+38 DIFFER : VARIABLE : airTemperature : POSITION : [144] : VALUES : 1.68156e-44 <> 3.40282e+38

This may be bug somewhere else since it looks like missing values are stored as zero instead of the missing value defined.

rmclaren commented 5 months ago

@PatNichols I might need direct access to your CI servers as I have not been able to replicate this issue anywhere else, Tried HERA, ORION, local mac, etc....

BenjaminRuston commented 5 months ago

there's still something odd here. There are 3 moisture variables in the file, and then a 4th moisture variable has an error specified without any values:

  Variable ObsValue/dewPointTemperature
  Variable ObsValue/virtualTemperature
  Variable ObsValue/specificHumidity
...
  Variable ObsError/relativeHumidity
PatNichols commented 5 months ago

@PatNichols I might need direct access to your CI servers as I have not been able to replicate this issue anywhere else, Tried HERA, ORION, local mac, etc....

@rmclaren Do you have access to AWS instances?

rmclaren commented 5 months ago

@PatNichols Unfortunately no.

PatNichols commented 5 months ago

@PraveenKumar-NOAA I have run the ctests for your PR on a ubuntu AWS instance and I see the same failure so it looks like it is real failure. When I get some breathing room, I will try it on my mac to check further.

PraveenKumar-NOAA commented 5 months ago

@PatNichols thanks, I did a few more checks which include tests using the global prepbufr yaml, original (large version) and trimmed BUFR file sizes. For all of these cases, test_iodaconv_bufr_ncep_rtma_prepbufr_adpupa passed on Hera. It will be better if you also check using your MAC.

PatNichols commented 5 months ago

@PraveenKumar-NOAA Checked on my mac and it failed. This is with spack-stack-1.6.0 and an intel chip.

PraveenKumar-NOAA commented 5 months ago

@PatNichols thanks, let us talk tomorrow if you are available.

PatNichols commented 5 months ago

@PraveenKumar-NOAA Are you using the gnu compiler on Hera?

PraveenKumar-NOAA commented 5 months ago

@PraveenKumar-NOAA Are you using the gnu compiler on Hera?

@PatNichols intel compiler.

PatNichols commented 5 months ago

@PraveenKumar-NOAA Are you using the gnu compiler on Hera?

@PatNichols intel compiler.

It's the gnu and clang compilers that are failing.

rmclaren commented 5 months ago

Wondering if the fix in PR# 1467 might help with this problem ?? Or is it unrelated...

PatNichols commented 5 months ago

Wondering if the fix in PR# 1467 might help with this problem ?? Or is it unrelated...

@PraveenKumar-NOAA @rmclaren Look likes that PR has fixed this PR too!

BenjaminRuston commented 5 months ago

@PraveenKumar-NOAA and @CoryMartin-NOAA and @ADCollard

there's something odd here and didn't initially notice... there are 3 moisture obsValues and then a 4th moisture value in the obsError this seems very strange:

  Variable ObsValue/dewPointTemperature
  Variable ObsValue/virtualTemperature
  Variable ObsValue/specificHumidity
...
  Variable ObsError/relativeHumidity
PraveenKumar-NOAA commented 4 months ago

@PatNichols @PatNichols all checks have passed now. I will use the smaller file

ObsError/relativeHumidity

@BenjaminRuston thanks. I removed ObsError/relativeHumidity from YAML. But I think it is OK to keep rest of the three moisture ObsValue variables, provide more flexibility to downstream users.

BenjaminRuston commented 4 months ago

OK,,, yes that makes it a bit clearer as the use will select their moisture variable and then will be responsible for defining the error.

is their any reason this would be different from the ADPUPA file coming from the split_by_subset of a prepBUFR file... we likely don't want too many redundant pathways to decode this file... and there is the python version of a routine to do this:

https://github.com/JCSDA-internal/ioda-converters/blob/develop/src/ncepbufr/bufr_ncep_prepbufr_adpupa.py

it makes sense to me somewhat to have a python API version and a direct yaml version but seems like they should be very close and produce very similar IODA output files.

And to the point. @CoryMartin-NOAA , @ADCollard and @PraveenKumar-NOAA are you opposed to us making any changes to this to allow this converter pathway to produce a file usable by a GDAS proxy app for radiosonde

CoryMartin-NOAA commented 4 months ago

@BenjaminRuston what sort of changes would you have in mind (I'll be honest I'm only tangentially involved in this converter work)

BenjaminRuston commented 4 months ago

would like to have a consistent set of ObsValues for the ADPUPA file from prepBUFR, or even from the [3D]-RTMA APUPA.

We also would like to have a first set of ObsErrors currently when we create a radiosonde file we typically have these values:

range(/MetaData/dateTime): [2024-01-10 02:45:00, 2024-01-10 08:57:24]
range(/MetaData/height): [6.0, 32400.0]
range(/MetaData/latitude): [-9.671710014343262, 71.40470886230469]
range(/MetaData/longitude): [2.13782000541687, 358.20001220703125]
range(/MetaData/pressure): [490.0, 103000.0]
range(/MetaData/releaseTime): [2024-01-10 03:00:00, 2024-01-10 08:00:00]
range(/MetaData/stationElevation): [1.0, 1494.0]
range(/MetaData/stationIdentification): [01001, 97980]
range(/MetaData/variable_names): [air_temperature, virtual_temperature]
range(/ObsError/airTemperature): [0.800000011920929, 1.5]
range(/ObsError/specificHumidity): [1.6827051467771525e-06, 0.0034307106398046017]
range(/ObsError/virtualTemperature): [--, --]
range(/ObsError/windEastward): [1.399999976158142, 3.200000047683716]
range(/ObsError/windNorthward): [1.399999976158142, 3.200000047683716]
range(/ObsType/airTemperature): [120, 221]
range(/ObsType/specificHumidity): [120, 221]
range(/ObsType/virtualTemperature): [120, 221]
range(/ObsType/windEastward): [120, 221]
range(/ObsType/windNorthward): [120, 221]
range(/ObsValue/airTemperature): [186.4499969482422, 289.16314697265625]
range(/ObsValue/specificHumidity): [9.999999974752427e-07, 0.004261000081896782]
range(/ObsValue/windEastward): [-26.200000762939453, 77.0]
range(/ObsValue/windNorthward): [-45.900001525878906, 35.900001525878906]
range(/PreQC/airTemperature): [2, 3]
range(/PreQC/specificHumidity): [2, 115]
range(/PreQC/windEastward): [2, 114]
range(/PreQC/windNorthward): [2, 114]

a notable MetaData record that is not in this PR is:

/MetaData/releaseTime

also for ObsError there is a value for windEastward but not a corresponding value in this PR for:

/ObsError/windNorthward

also we typically use PreQC as a group while this PR seems to be using QualityMarker:

/QualityMarker/airTemperature
/QualityMarker/height
/QualityMarker/pressure
/QualityMarker/specificHumidity
/QualityMarker/windEastward

@PraveenKumar-NOAA who is currently using this output file?

@CoryMartin-NOAA that is what I'm thinking... should we either fix these now, or in a subsequent PR with someone on the NOAA side confirming the output file is still something that they find useful and carries what they want

also would prefer fewer moisture variables, specificHumidity being the top choice which is absent from this PR

BenjaminRuston commented 4 months ago

@PraveenKumar-NOAA ,,,, OK,,, one last item talked this over with group and there are still some error, or questions rather about the groups and keys in the file.

If you go to the CDASH output for this PR, and look at the iodaconv_validate_testoutput_gdas.t00z.rtma_adpupa.prepbufr.nc which is here:

https://cdash.jcsda.org/test/6248795

You will find that the output file has these entries that do not follow the current conventions:

   ERROR: Variable 'MetaData/obsTimeMinusCycleTime' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/prepbufrReportType' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/dumpReportType' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/prepbufrDataLevelCategory' is not listed in the YAML conventions file.
   ERROR: Variable 'QualityMarker/wind' is not listed in the YAML conventions file.
   ERROR: Variable 'ObsError/wind' is not listed in the YAML conventions file.

so now understand this may be a specific file used for validation and verification and not for assimilation. Further, you will likely be creating a decoder yaml for the RDAS app separately. As @CoryMartin-NOAA mentioned yesterday we would like what is in here to be generic.

In short, if we want to keep this would make these recommendation:

if this does move this too far away from usefulness we may want to just close this. This is as this year we do want to enforce the validation standard which will involve cleaning up all the violations currently existing. And we would like to not add additional ctests which add to the technical debt (increase the number of validation errors).

Does it sound OK to clean these for now, and again we can augment as needed, potentially using existing keys such as MetaData/instrumentIdentifier and others for items if needed.

PraveenKumar-NOAA commented 4 months ago

@PraveenKumar-NOAA ,,,, OK,,, one last item talked this over with group and there are still some error, or questions rather about the groups and keys in the file.

If you go to the CDASH output for this PR, and look at the iodaconv_validate_testoutput_gdas.t00z.rtma_adpupa.prepbufr.nc which is here:

https://cdash.jcsda.org/test/6248795

You will find that the output file has these entries that do not follow the current conventions:

   ERROR: Variable 'MetaData/obsTimeMinusCycleTime' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/prepbufrReportType' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/dumpReportType' is not listed in the YAML conventions file.
   ERROR: Variable 'MetaData/prepbufrDataLevelCategory' is not listed in the YAML conventions file.
   ERROR: Variable 'QualityMarker/wind' is not listed in the YAML conventions file.
   ERROR: Variable 'ObsError/wind' is not listed in the YAML conventions file.

so now understand this may be a specific file used for validation and verification and not for assimilation. Further, you will likely be creating a decoder yaml for the RDAS app separately. As @CoryMartin-NOAA mentioned yesterday we would like what is in here to be generic.

In short, if we want to keep this would make these recommendation:

  • believe the MetaData/obsTimeMinusCycleTime could be converted to /MetaData/releaseTime
  • the QualityMarker/wind and ObsError/wind could be replicated into ObsError and QualityMarker for both windEastward and windNorthward so those 2 entries into 4 entries
  • for now remove the 2 ReportTypes and the DataLevelCategory again for this PR into ioda-converter

if this does move this too far away from usefulness we may want to just close this. This is as this year we do want to enforce the validation standard which will involve cleaning up all the violations currently existing. And we would like to not add additional ctests which add to the technical debt (increase the number of validation errors).

Does it sound OK to clean these for now, and again we can augment as needed, potentially using existing keys such as MetaData/instrumentIdentifier and others for items if needed.

@BenjaminRuston thanks for your comments. I consider all of your suggestions and applied all of them, except MetaData/obsTimeMinusCycleTime has been removed from the YAML. The validation tests results are:

Final results: errors: 0 warnings: 0

BenjaminRuston commented 4 months ago

awesome thanks Praveen, see that all the validate tests ran 2x am going to retrigger CI but believe this is in good shape many thanks