ecmwf / pdbufr

High-level BUFR interface for ecCodes
Apache License 2.0
23 stars 8 forks source link

Question: How to extract two different values for pressure #51

Open blaylockbk opened 1 year ago

blaylockbk commented 1 year ago

This is a pdbufr (v0.10.2) usage question. Please excuse my novice knowledge of BUFR data; I'm just learning about this data type.

I'm reading a BUFR file that has two values for "pressure" for each observation; the two values are the top and bottom pressure level used to describe one observation.

The command bufr_dump -d <bufr_file.bufr> produces this output showing there are two different pressure levels:

...(more above)

031001  delayedDescriptorReplicationFactor      DELAYED DESCRIPTOR REPLICATION FACTOR [Numeric]
007004  pressure        PRESSURE [Pa]
007004  pressure        PRESSURE [Pa]
103000  103000  103000 [103000]
031001  delayedDescriptorReplicationFactor      DELAYED DESCRIPTOR REPLICATION FACTOR [Numeric]
008023  firstOrderStatistics    FIRST-ORDER STATISTICS [CODE TABLE]
011003  u       U-COMPONENT [m/s]
011004  v       V-COMPONENT [m/s]

...(more below)

But when I read the file with pdbufr, only one value for pressure is returned.

pdbufr.read_bufr(
    FILE,
    columns=["latitude", "longitude", "pressure", "u", "v"],
)

image

Is there a way to target a specific pressure value (first or second) or return both?

sandorkertesz commented 1 year ago

Thank you for the question. Please can I ask you to provide us with your BUFR data file (at least one BUFR message)?

blaylockbk commented 1 year ago

Sure. Is it possible to share a file via email so it's not public?

sandorkertesz commented 1 year ago

Thanks! Please send it to my email address, which you can find here: https://github.com/ecmwf/pdbufr/commit/c56f5f2418a0fd9af6bc9dbf1f83c4b08baae35e.patch

sandorkertesz commented 1 year ago

Hi Brian,

Thanks for sharing the data with me. Your data has this structure:

image

So u and v values are "embedded" in two consecutive pressure values. Unfortunately, pdbufr cannot handle this situation. Its collector is not able to handle "coordinate" keys repeated like pressure in your data. However, I can see the benefit of implementing this kind of filtering, but right now I cannot give you an estimate when it will be available. I create a separate issue for this development.

Best regards, Sandor

sandorkertesz commented 1 year ago

Hi Brian,

One possible way of achieving your goal is to use the flat mode. Since your messages contain compressed subsets you can run

df = pdbufr.read_bufr(f, flat=True)

for each message separately (so f here only contains one message) and would get a meaningful result. Then you could collect the data you need by iterating through the dataframe columns. It is important to do this per message because it is not guaranteed that the messages have the same structure. Of course, if all the messages have the very same structure you can run the code above for the whole file.

Best regards, Sandor

blaylockbk commented 1 year ago

Thanks @sandorkertesz for looking into this. I'll try your suggestion with flat=True for the individual messages.

How did you produce this view? Is this with metview? image