ecmwf / pdbufr

High-level BUFR interface for ecCodes
Apache License 2.0
23 stars 8 forks source link

Message structure not identified correctly #49

Closed sandorkertesz closed 1 year ago

sandorkertesz commented 1 year ago

pdbufr uses an in memory cache to identify and reuse the message structure as it is processing the messages in a BUFR file. Cache entries are identified by the following header keys and contain all the keys for a given message (structure):

"edition", "masterTableNumber", "numberOfSubsets","unexpandedDescriptors", "delayedDescriptorReplicationFactor"

So if there is already a cache entry for the given message the list of keys are taken from the cache instead of using the key iterator to read them from the message.

The following BUFR file contains 2 messages:

https://get.ecmwf.int/repository/test-data/pdbufr/test-data/message_structure_diff_2.bufr

and according to pdbufr their structure is identical because the value of the keys listed above are the same:

(4, 0, 1, 307096, 22061, 20058, 4024, 13012, 4024, 1, 0)

However, the first message contains more keys as using bufr_dump -p confirms it:

This is the end of the first message:

#24#timePeriod=-1
depthOfFreshSnow=MISSING
#25#timePeriod=0

This is the end of the second message:

#20#timePeriod=-1
depthOfFreshSnow=MISSING
#21#timePeriod=0

The bottom line is that the message structure identification mechanism does not work correctly in pdbufr and has to be improved.

shahramn commented 1 year ago

Let's extract the two messages and compare the output of bufr_dump -p on each

bufr_copy message_structure_diff_2.bufr 'delme__[count].bufr'
bufr_dump -p delme__1.bufr > dump.1
bufr_dump -p delme__2.bufr > dump.2
meld dump.*

we see the following key is different shortDelayedDescriptorReplicationFactor

shahramn commented 1 year ago

There are 3 such keys: shortDelayedDescriptorReplicationFactor delayedDescriptorReplicationFactor extendedDelayedDescriptorReplicationFactor

sandorkertesz commented 1 year ago

Does it mean if we added shortDelayedDescriptorReplicationFactor and extendedDelayedDescriptorReplicationFactor to the key list we could uniquely identify the message structure for all message types?