Add missing value handling options

ecmwf / pdbufr

High-level BUFR interface for ecCodes

Apache License 2.0

23 stars 8 forks source link

Add missing value handling options #58

Open sandorkertesz opened 1 year ago

sandorkertesz commented 1 year ago

Currently read_bufr does not offer control over missing values during the extraction and we have to filter the resulting Pandas dataframe to remove them.

Option 1

Add option missing_value_policy with the following values: "include", ignore" (default="include")

df = pdbufr.read_bufr(...., missing_value_policy="ignore")

Option 2

Add option skip_missing as a bool (default=False)

df = pdbufr.read_bufr(...., skip_missing=True)

Option 3

Add option skip_na_values as a bool (default=False)

df = pdbufr.read_bufr(...., skip_na_values=True)

tlmquintino commented 1 year ago

how about: df = pdbufr.read_bufr(...., missing_values="ignore")

I think we should use as policies (identified by strings), but the key does not need to be verbose to include _policy

sandorkertesz commented 1 year ago

Yes, the shorter the better!

pmaciel commented 1 year ago

Maybe missing_values=None?

iainrussell commented 1 year ago

In the case where the user wants to extract five variables from the data and just one of them is missing for a given row, would missing_values="ignore" remove the row, or would it only remove the row if all five variables are missing? And so, do we need an option to disambiguate this?

sandorkertesz commented 1 year ago

Maybe missing_values=None?

What other values than None could we specify for missing_values? What would they mean?

shahramn commented 1 year ago

I prefer option 2