NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
46 stars 22 forks source link

look into eliminating RPSEQ### duplication in stseq function #612

Closed jbathegit closed 2 months ago

jbathegit commented 3 months ago

Currently, when the stseq function recursively runs to store internal table information for a standard Table D descriptor, it doesn't keep track of which RPSEQ### sequences it has already generated, which means duplication can occur such as the following snippet from a run of the debufr utility on test/testfiles/IN_4:

| MSTTB001 | GSRADSEQ  222000  236000  "RPSEQ002"185  GCLONG  GNAP             |
| MSTTB001 | "RPSEQ003"36  222000  237000  GCLONG  GNAP  MDPC  "RPSEQ004"36    |
| MSTTB001 | 222000  237000  GCLONG  GNAP  MDPC  "RPSEQ005"36  222000  237000  |
| MSTTB001 | GCLONG  GNAP  MDPC  "RPSEQ006"36  222000  237000  GCLONG  GNAP    |
| MSTTB001 | MDPC  "RPSEQ007"36  224000  237000  GCLONG  GNAP  FOST            |
| MSTTB001 | "RPSEQ008"36  224000  237000  GCLONG  GNAP  FOST  "RPSEQ009"36    |
|          |                                                                   |
| GSRADSEQ | SIDENSEQ  NPPR  NPPC  LSQL  SAZA  SOZA  HITE  "CLFRASEQ"12        |
| GSRADSEQ | "RPSEQ001"2  "CSRADSEQ"12                                         |
|          |                                                                   |
| SIDENSEQ | SIDGRSEQ  YYMMDD  HHMMSS  LTLONH                                  |
|          |                                                                   |
| SIDGRSEQ | SAID  GCLONG  SCLF  SSNX  SSNY                                    |
|          |                                                                   |
| YYMMDD   | YEAR  MNTH  DAYS                                                  |
|          |                                                                   |
| HHMMSS   | HOUR  MINU  SECO                                                  |
|          |                                                                   |
| LTLONH   | CLATH  CLONH                                                      |
|          |                                                                   |
| CLFRASEQ | SCCF  SCBW  CLDMNT  NCLDMNT  CLTP                                 |
|          |                                                                   |
| RPSEQ001 | SIDP  IMHC  PRLC  PRLC  REHU                                      |
|          |                                                                   |
| CSRADSEQ | SIDP  RDTP  RDCM  SCCF  SCBW  SPRD  RDNE  TMBRST                  |
|          |                                                                   |
| RPSEQ002 | DPRI                                                              |
|          |                                                                   |
| RPSEQ003 | PCCF                                                              |
|          |                                                                   |
| RPSEQ004 | PCCF                                                              |
|          |                                                                   |
| RPSEQ005 | PCCF                                                              |
|          |                                                                   |
| RPSEQ006 | PCCF                                                              |
|          |                                                                   |
| RPSEQ007 | PCCF                                                              |
|          |                                                                   |
| RPSEQ008 | 224255                                                            |
|          |                                                                   |
| RPSEQ009 | 224255                                                            |

So, for example, it might be useful to be able to replace the last 14 lines above with just:

|          |                                                                   |
| RPSEQ003 | PCCF                                                              |
|          |                                                                   |
| RPSEQ004 | 224255                                                            |

and then replace all occurrences of RPSEQ00[3-7] and RPSEQ00[89] with, respectively, RPSEQ003 and RPSEQ004 in the prior MSTTB001 definition. To do this would likely just require stseq to start keeping its own internal sequence cache between recursive calls, and then before each instance where it currently calls igettdi and stseq to store a new RPSEQxxx sequence with xxx = irepct+1, it first checks its internal cache of existing RPSEQyyy sequences where yyy = 0, 1, 2, ..., irepct to see whether an identical sequence has already been stored, and if so just re-uses that same RPSEQyyy sequence instead of calling igettdi and stseq to store a new one.

Off the top of my head, I'm thinking this cache of already-stored RPSEQyyy mnemonics and corresponding rpidn (i.e. FXY) numbers could be maintained in a separate new C function that's only ever called from within stseq. Of course, we'd also have to include a way to reset the cache back to yyy = 0 whenever stseq itself is called with irepct = 0.