PolicyEngine / microdf

Analysis tools for working with survey microdata as DataFrames.
http://pslmodels.github.io/microdf
MIT License
12 stars 10 forks source link

MicroDataFrame.groupby(col)[[cols]].aggfunc doesn't consider weights #193

Open MaxGhenis opened 3 years ago

MaxGhenis commented 3 years ago

Example given d = mdf.MicroDataFrame(dict(g=["a", "a", "b"], y=[1, 2, 3]), weights=[4, 5, 6]):

d.groupby("g").sum() works:

y
--

14.0
18.0

d.groupby("g").y.sum() also works:

g
a    14.0
b    18.0
dtype: float64

d.groupby("g").sum()["y"] works too:

g
a    14.0
b    18.0
Name: y, dtype: float64

d.groupby("g")[["y"]].sum() does not:

  | y
-- | --

3
3

d.groupby("g")["y"].sum() also doesn't work:

g
a    3
b    3
Name: y, dtype: int64

d.groupby("g").sum()[["y"]] produces a KeyError (see #192).

MaxGhenis commented 3 years ago

Need to add a __getitem__ to DataFrameGroupBy:

https://github.com/PSLmodels/microdf/blob/0760dfecae5a5b8974d87622644f0df6248e26ce/microdf/generic.py#L416