Speed up bufr_dupupr.f code and generation of uprair bufr dump files

ilianagenkova commented 6 months ago

bufr_dupupr.f is a code developed by Dennis Keiser and updated by Chris Hill, and it was added to the obsproc v1.2.0 release It is the slowest step in the generation of uprairbufr_d files and leads to long gdas and gfs dump run times, unacceptable by NCO in their current ecf configuration (kick off times).

Two changes were tested and made to speed up the code:

latitude presence check - reduced the number of pressure levels in a profile checked for missing latitudes, from 25 to 5, saving ~150s of processing time
sorting the array with all profiles by receipt time stamp is skipped (sorting by obs time stamp and station id remains)

One more change was considered, but not implemented - turn off the CORN mnemonic check (was a profile correction done by data provider). While this mnemonic is good to have in the bufr file, and it's in the prepbufr layout, it's not actually written in the prepbufr file , and it's not read by GSI. It would have ~150s . This could be turned off when the next amount of TAC profiles is replaced by BUFR high res profiles, as an easy "speed up" solution.

What else to explore? (scope of this task)

Investigate the use of UFBTAB in bufr_dupupr and consider replacing it with faster (according to J.Woolen) UFBTAM or UFBMEM which may speed up the reading/writing process, but may require opening the file for read with a different bufrlib command.
Evaluate ratio of TAConly/TAC&BUFR/BUFRonly profiles in the tanks and consider turning off the check for TAC&BUFR stations (if that saves us time) Study the code and test replacing the UFBTAB calls, if help is needed, turn to Jack Woollen and Ron McLaren.

At the current time, we are working with SPAs to start the global dump steps earlier and are evaluating the impact on data loss, as a temporary solution in starting to use bufr profiles asap.

rmclaren commented 6 months ago

Can't find bufr_dupupr.f..

ilianagenkova commented 6 months ago

I should have provided a path- [https://github.com/NOAA-EMC/bufr-dump/blob/release/bufr_dump.v1.2.0/sorc/bufr_dupupr.fd/dupupr.f]

rmclaren commented 6 months ago

@ilianagenkova What type of input file does this program take? Just FYI, this code does not seem to be compatible with the gfortran compiler as the formattig specifier "Q" is none standard and unsupported. Are there any special compiler flags or procedures to compile on this platform. I guess I should look at the build.sh script..

rmclaren commented 6 months ago

Think I found my answer in CMakelists.txt:

# Compiler check.
if(NOT CMAKE_Fortran_COMPILER_ID MATCHES "^(Intel)$")
  message(WARNING "Compiler not officially supported: ${CMAKE_Fortran_COMPILER_ID}")
endif()

ilianagenkova commented 3 months ago

@rmclaren, we cd to /bufr-dump and run ./ush/build.sh

I am happy to test run your code changes, b/c it's not trivial to run just the executable bufr_dupupr

rmclaren commented 3 months ago

@ilianagenkova So from what I can tell this code reads and combines a selection of subtypes out of the \b002 tank (one of them being xx101) to create the uprair dump file. Basically it reads the data, combines (ordered by message timestamp) and cleans the data, then writes the new data file. Is this correct?

Some questions: 1) Could you list all the subtypes that are being read (xx101... any others (?))? 2) Can you have files from different days if time window specifies this? 3) Could you list the things its doing during the cleaning phase (at a high level). Is the following complete? a) Throws away message subsets with out of bounds WMO block number (CRPID ranges 0 to 99) b) Find file with valid LAT,LON coords (and warn)??? c) Mark corrected report fields (CORN ???) d) Set missing minutes field to 0 e) Order message subsets by timestamp f) Remove duplicate messages subsets g) Trim data to exact start/end times 4) What do you get for the timestamps you print out. 5) Who uses the output file and what format do they ultimately use (netcdf, bufr...??).

NOAA-EMC / bufr-dump

Speed up bufr_dupupr.f code and generation of uprair bufr dump files #18