ecmwf / pdbufr

High-level BUFR interface for ecCodes
Apache License 2.0
23 stars 8 forks source link

Make it possible to filter out all NaN values #65

Open sferics opened 1 year ago

sferics commented 1 year ago

Is your feature request related to a problem? Please describe.

I tried to use the "filters" flag of the read_bufr function to filter out NaN values. My filter was a very simple lambda function: filter = lambda x : pandas.notna(x)

When I used it to get rid of missing data of a single parameter, it worked fine. But as I took many parameters, the returned pandas DataFrame shrunk and did not contain the desired data anymore, or it was even empty.

I suspect that this is due to the nature of the filter conditions. In the documentation, you mention that they are connected with logical AND: https://pdbufr.readthedocs.io/en/latest/read_bufr.html#combining-conditions

The problem for me is that without filtering I get a quite big DataFrame with many missing values which I have to get rid of afterwards. I've noticed that a lot of columns actually just contain NaN values.

Describe the solution you'd like

It would be nice to have the option to connect conditions with logical OR instead. Maybe that could already solve my problem.

Describe alternatives you've considered

Another solution I can imagine is having the option to use the equivalent of "df.loc[:, parameter].notna().any()" on each column (parameter) before returning the DataFrame. If this condition returns True for a column, i.e., it only consists of missing values, the column gets dropped.

Ideally, this would be done before the DataFrame is created internally.

Additional context

My solution for now is that I call df.dropna(how="all") on both axis after I've created the DataFrame. But this is not a very efficient way to do it, especially for large amount of data.

Organisation

Meteo Service weather research

sandorkertesz commented 1 year ago

Please see #58

sferics commented 1 year ago

Oh, thanks! I overlooked that... Yes, that is exactly what I meant. I would be really happy to see such a feature in this great piece of software in future. Keep up the good work! Best regards