Closed spanezz closed 5 months ago
One simple solution to this would be to also group by table versions. That has the downside that if two BUFR files use different tables that however are identical for the codes used, they will end up in different NetCDF files. Would that be an acceptable compromise?
Alternatively, I can think of indexing BUFR files by the recursive expansion of all the B and D codes in their data descriptor section, which may be more computationally expensive.
Alternatively, I can see how complex it would be to do the grouping by table version, and then merge the resulting arrays when they have the same shape.
Thank you for the detailed analysis, I think the first solution is acceptable considering how cosmo code works.
Would this implicitly solve also #11 without need of conversion?
I think it would also solve #11 indeed
I pushed 7098e62e01807e09f418bf00c35d090bdfe51896: can you give it a try?
I confirm that the modified version works without error also with the complete original BUFR file.
Can we publish a new release or do you need to make further updates?
I have no other updates planned adn you can publish a new release, I have just updated the NEWS.md
v1.7-1 released (and already in copr repo)
Currently input messages are grouped into output files by the contents of their data descriptor section, with the intention of guaranteeing that input data are omogeneous and all fit in the same output arrays.
In the case of the example given at #11 however this breaks: for both messages the data descriptor sections are identical and contain:
However the first message has Table version: 26:1 and the second has Table version: 9:1.
Table 26:1 expands 321022 as:
While table 9:1 expands it as:
Causing the 007007/010007 discrepancy that was observed in #11.