Open jgoriasilva opened 7 months ago
Hi @jgoriasilva,
Here is what I use to do something similar...
df = metar.parse_metar_file(StringIO('\n'.join(val for val in data.metar)),
year=date.year, month=date.month)
Here I am using the date time module to set a date and the StringIO module for taking the string and making it into a byte-like object to put into the metar parser form MetPy. The above also assumes the Pandas Dataframe is called data with a column named metar.
Thanks for your answer @kgoebber.
That looks good, but doing that way I would lose the information of the original data DataFrame, particularly the alignment between the parsed metar and the rows of the original DataFrame (parse_metar_file or parse_metar_to_dataframe generates an arbitrary index).
What I would like to do is to process the metar data from a column of an existing DataFrame and create new columns in that same dataframe with the new columns that the parse_metar_to_dataframe generates.
Maybe I'm overlooking something here, but one way that I'm currently doing it is like this:
from metpy.io import parse_metar
res = df['metar'].apply(parse_metar, args=(2024, 4))
columns = res.iloc[0]._fields
res = pd.DataFrame(index=res.index, data=[x._asdict() for x in res.values], columns=columns)
res.drop(columns='date_time', inplace=True)
df = pd.concat([df, res], axis=1)
The problem is that by doing that way, I sometimes get a ParseError for a few rows that present a problematic metar information, which is an additional problem I just found:
ParseError: Line 1: expected one of:
- [\d] from METAR::datetime
- "Z" from METAR::datetime
1 | METAR SBFL 221300 17006KT 9999 BKN020 24/16 Q1017=
I'm still looking for a solution for this as well.
It's exceedingly frustrating that there's not a way to get Pandas to just expand the tuple into multiple columns, because otherwise parse_metar
"just works" with .apply()
:
from functools import partial
from metpy.io.metar import parse_metar
import pandas as pd
obs = ['KADS 122347Z 17013G20KT 13SM SCT039 23/14 A2986',
'KBCT 122353Z 12008KT 10SM FEW032 22/16 A3009',
'KCWA 122347Z 28010KT 10SM CLR 16/M02 A2969',
'KOUN 122345Z 19014KT 10SM CLR 24/09 A2975']
s = pd.Series(obs)
parser = partial(parse_metar, year=2024, month=4)
s.apply(parser)
gives:
0 (KADS, 32.97, -96.82, 196, 2024-04-12 23:47:00...
1 (KBCT, 26.38, -80.09, 4, 2024-04-12 23:53:00, ...
2 (KCWA, 44.78, -89.67, 389, 2024-04-12 23:47:00...
3 (KOUN, 35.25, -97.47, 357, 2024-04-12 23:45:00...
dtype: object
What data are you working with that's giving you a column with reports in it?
What should we add?
I have a dataframe in which there is a column with strings of METAR reports. Currently, if I use the parse_metar_to_dataframe, which only accepts a string as an input, it will generate one dataframe for each string of my column, resulting in a series of dataframes (if I use pandas.series.apply for example). It would be much easier to use the parser if it could accept a pandas Series and return a single DataFrame with the same columns as currently it does currently but where each row is a parsed METAR, instead of one dataframe with one row only for each parsed string. I might be missing something with the usage, but as I understand there is no way to do it without creating unnecessary overhead with the
Reference
It would be fairly simple to implement this. I can do it from my side and create a pull request, creating a new function that uses the existing parse_metar (from metpy.io) but that accepts a pandas Series of str (or list of str) and returns a single pandas Dataframe.