SophMC / notechain

A daily diary of learning data science skills.
13 stars 4 forks source link

Creating datetime index from year and month columns combined #3

Closed SophMC closed 8 years ago

SophMC commented 8 years ago

I have taken the mean and std of ws grouped by year and month. This is the first 5 lines of the new dataframe which I stored the values in.

year month mean std 0 1984 3 5.060840 2.280239 1 1984 4 5.219944 1.490117 2 1984 5 5.595610 1.396281 3 1984 6 6.520606 1.927716 4 1984 7 6.095756 1.958878 year int64 month int64 dtype: object

I want to create a datetime object from the year and month columns and make it the index of the dataframe, preferably using: pd.to_datetime and preferably using only one line of code!

These attempts don't work: df.index=pd.to_datetime(year + month) df.index=pd.to_datetime(df.year + df.month) df['date'] = df.apply(lambda row: datetime(row['Year'], row['Month'], axis=1)) #with the intention to change it to the index after. df['date']= df.apply(lambda x:datetime.strptime("{0} {1}".format(x['year'],x['month']), "%Y%m"),axis=1)

Though i've already done something similar using pd.to_datetime which does work and I'm not sure why it does, as nothing above seems to. I suspect it is something to do with the new dataframe I created: wind['date_time'] = pd.to_datetime(wind.date_time) + wind.hour.astype('timedelta64[h]')

wind.date_time is already a datetime object created by read_csv when I read the file in and tell it to parse the first 3 columns together.

I can do it if I use a loop:

for i in range(0,len(df)):
year_n, month_n = df.index[i]
year_new.append(datetime(year_n,month_n,1))

But I guess that's slow and not the best use of pandas...

SophMC commented 8 years ago

Solution Turn the groupby MultiIndex (consisting of [year[month]]) into two columns [year][month]: wind_group = wind_group.reset_index()

Join year and month into one column with the format yyyy/mm and apply to_datetime for each row: wind_group['date'] = wind_group[['year', 'month']].apply(lambda x: pd.to_datetime('/'.join(x)), axis=1)

Make the new datetimeIndex column ['date'] the dataFrame index: wind_group.index=wind_group['date']

aaren commented 8 years ago

Actually lambda here is ok - the function is pretty simple. Main thing is readability - if the lambda isn't short and simple then use a separate def.