Priesemann-Group / covid19_inference_forecast

GNU General Public License v3.0
178 stars 53 forks source link

More condensed data preprocessing #2

Closed michaelosthege closed 4 years ago

michaelosthege commented 4 years ago

Hi there,

Really cool model! I'm working to understand & run your model and found it a bit easier to preprocess the JHU data with a bit of pandas:

First, to reformat the awkwardly formatted original JHU data to a DataFrame with multi-index:

def _jhu_to_iso(fp_csv:str) -> pandas.DataFrame:
    """Convert Johns Hopkins University dataset to nicely formatted DataFrame.

    Drops Lat/Long columns and reformats to a multi-index of (country, state).
    """
    df = pandas.read_csv(fp_csv, sep=',')
    # change columns & index
    df = df.drop(columns=['Lat', 'Long']).rename(columns={
        'Province/State': 'state',
        'Country/Region': 'country'
    })
    df = df.set_index(['country', 'state'])
    # datetime columns
    df.columns = [datetime.datetime.strptime(d, '%m/%d/%y') for d in df.columns]
    return df

Then filtering by country/state is much easier:

country = 'Germany'
state = None

# load & transform
df_confirmed = _jhu_to_iso(fp_confirmed) # <-- filepath or URL to original CSV
df_deaths = _jhu_to_iso(fp_deaths)
df_recovered = _jhu_to_iso(fp_recovered)

# filter
df = pandas.DataFrame(columns=['date', 'confirmed', 'deaths', 'recovered']).set_index('date')
df['confirmed'] = df_confirmed.loc[(country, state)]
df['deaths'] = df_deaths.loc[(country, state)]
df.index.name = 'date'

With datetime objects in the DataFrame index, one can slice directly with the date:

date_data_begin = datetime.datetime(2020, 3, 1)
date_data_end = df.index[-1]
df.loc[date_data_begin:date_data_end, 'confirmed'].values

cheers

jdehning commented 4 years ago

@joaopn Would it be a good idea to reformat the RKI data in the same way, that is with a 3 way multiindex (country, state, landkreis)? Than I would say that it sounds something that we could take in our module... And would you like to do it? ;)