Parse metar from pandas dataframe into another dataframe

jgoriasilva commented 7 months ago

What should we add?

I have a dataframe in which there is a column with strings of METAR reports. Currently, if I use the parse_metar_to_dataframe, which only accepts a string as an input, it will generate one dataframe for each string of my column, resulting in a series of dataframes (if I use pandas.series.apply for example). It would be much easier to use the parser if it could accept a pandas Series and return a single DataFrame with the same columns as currently it does currently but where each row is a parsed METAR, instead of one dataframe with one row only for each parsed string. I might be missing something with the usage, but as I understand there is no way to do it without creating unnecessary overhead with the

Reference

It would be fairly simple to implement this. I can do it from my side and create a pull request, creating a new function that uses the existing parse_metar (from metpy.io) but that accepts a pandas Series of str (or list of str) and returns a single pandas Dataframe.

kgoebber commented 7 months ago

Hi @jgoriasilva,

Here is what I use to do something similar...

df = metar.parse_metar_file(StringIO('\n'.join(val for val in data.metar)),
                            year=date.year, month=date.month)

Here I am using the date time module to set a date and the StringIO module for taking the string and making it into a byte-like object to put into the metar parser form MetPy. The above also assumes the Pandas Dataframe is called data with a column named metar.

jgoriasilva commented 7 months ago

Thanks for your answer @kgoebber.

That looks good, but doing that way I would lose the information of the original data DataFrame, particularly the alignment between the parsed metar and the rows of the original DataFrame (parse_metar_file or parse_metar_to_dataframe generates an arbitrary index).

What I would like to do is to process the metar data from a column of an existing DataFrame and create new columns in that same dataframe with the new columns that the parse_metar_to_dataframe generates.

Maybe I'm overlooking something here, but one way that I'm currently doing it is like this:

from metpy.io import parse_metar

res = df['metar'].apply(parse_metar, args=(2024, 4))
columns = res.iloc[0]._fields
res = pd.DataFrame(index=res.index, data=[x._asdict() for x in res.values], columns=columns)
res.drop(columns='date_time', inplace=True)

df = pd.concat([df, res], axis=1)

The problem is that by doing that way, I sometimes get a ParseError for a few rows that present a problematic metar information, which is an additional problem I just found:

ParseError: Line 1: expected one of:

    - [\d] from METAR::datetime
    - "Z" from METAR::datetime

     1 | METAR SBFL 221300 17006KT 9999 BKN020 24/16 Q1017=

I'm still looking for a solution for this as well.

dopplershift commented 7 months ago

It's exceedingly frustrating that there's not a way to get Pandas to just expand the tuple into multiple columns, because otherwise parse_metar "just works" with .apply():

from functools import partial
from metpy.io.metar import parse_metar
import pandas as pd

obs = ['KADS 122347Z 17013G20KT 13SM SCT039 23/14 A2986',
       'KBCT 122353Z 12008KT 10SM FEW032 22/16 A3009',
       'KCWA 122347Z 28010KT 10SM CLR 16/M02 A2969',
       'KOUN 122345Z 19014KT 10SM CLR 24/09 A2975']

s = pd.Series(obs)
parser = partial(parse_metar, year=2024, month=4)

s.apply(parser)

gives:

0    (KADS, 32.97, -96.82, 196, 2024-04-12 23:47:00...
1    (KBCT, 26.38, -80.09, 4, 2024-04-12 23:53:00, ...
2    (KCWA, 44.78, -89.67, 389, 2024-04-12 23:47:00...
3    (KOUN, 35.25, -97.47, 357, 2024-04-12 23:45:00...
dtype: object

What data are you working with that's giving you a column with reports in it?

Unidata / MetPy

Parse metar from pandas dataframe into another dataframe #3476

What should we add?

Reference