banditelol / public-notes

public notes as issue thread, inspired by simonwilson/public-notes
1 stars 0 forks source link

Pandas Intermediate Cheatsheet #2

Open banditelol opened 1 year ago

banditelol commented 1 year ago

Pandas Intermediate Cheatsheet

This issue will contains my snippet for pandas that may not be covered in most beginner tutorial and appear quite often in my day to day tasks.

banditelol commented 1 year ago

Reverse One Hot Encoding from get_dummies

Main resource from SO

df_dummies.idxmax(1)

or in more general way, we could create factory like this

def reverse_dummies_factory(pat:str):
    return lambda df_: df_.filter(regex=pat).idxmax(1).str.replace(pat, "")

df_dummies.assign(doc = reverse_dummies_factory("DOC__"))
banditelol commented 1 year ago

Drop certain column based on regex

Main resource from SO (I forgot which questions)

df_.drop(df_.filter(regex='<pattern>').columns, axis=1)

or using pipe and factory function

def drop_columns_factory(colpat:str):
    return lambda df_: df_.drop(df_.filter(regex=colpat).columns, axis=1)

df.pipe(drop_columns_factory(r"<pattern>")
banditelol commented 1 year ago

Pandas Groupby Mechanism

It seems like there's pattern to applying custom aggregation function to a DataFrameGroup in Pandas. The pattern is split-apply-combine. It works like this

TODO: Fill in this implementation details and example