ecmwf / pdbufr

High-level BUFR interface for ecCodes
Apache License 2.0
23 stars 8 forks source link

Merging separate requests differs one joint request #31

Open pollyaschm opened 3 years ago

pollyaschm commented 3 years ago

Hello,

I started using pdbufr and came across a behaviour I don't understand. Maybe you can help me understanding it.

The results differ when I am requesting several variables (e.g. temperature and wind) at once (A) or each variable by itself and merging the two (B). Why do I not get the same result for both requests?

I am using pdbufr version 0.9.0. Thanks!

(A)

result_A = pdbufr.read_bufr(file, columns=('ident', 'heightOfStationGroundAboveMeanSeaLevel',
                                           'typicalDate', 'typicalTime',
                                           'airTemperature',
                                           'windSpeed'),
                            filters={'masterTablesVersionNumber': 31})

(B)

result_B_temp = pdbufr.read_bufr(file, columns=('ident', 'heightOfStationGroundAboveMeanSeaLevel',
                                                'typicalDate', 'typicalTime',
                                                'airTemperature'),
                                 filters={'masterTablesVersionNumber': 31})
result_B_wind = pdbufr.read_bufr(file, columns=('ident', 'heightOfStationGroundAboveMeanSeaLevel',
                                                'typicalDate', 'typicalTime',
                                                'windSpeed'),
                                 filters={'masterTablesVersionNumber': 31})
result_B = pd.merge(result_B_temp, result_B_wind,
                    on=['ident', 'heightOfStationGroundAboveMeanSeaLevel', 'typicalDate',
                        'typicalTime'], how='outer')

I get

result_A.tail()
    typicalDate typicalTime  ... airTemperature  windSpeed
738    20210614      083000  ...            NaN        5.7
739    20210614      083000  ...         301.55        1.8
740    20210614      083000  ...            NaN        NaN
741    20210614      083000  ...            NaN        0.8
742    20210614      083000  ...            NaN        2.8
[5 rows x 6 columns]

and

result_B.tail()
    typicalDate typicalTime  ... airTemperature  windSpeed
963    20210614      083000  ...            NaN        NaN
964    20210614      083000  ...         291.75        0.8
965    20210614      083000  ...            NaN        0.8
966    20210614      083000  ...         297.55        2.8
967    20210614      083000  ...            NaN        2.8
[5 rows x 6 columns]
iainrussell commented 3 years ago

Hi @pollyaschm , would you be able to attach the BUFR file to this issue please and we can have a look. Although I have some idea, I'd prefer to see the file first, then I'd hope to give a nice explanation! (Or, in the worst case, you've found a bug).

Thanks, Iain