PolicyEngine / microdf

Analysis tools for working with survey microdata as DataFrames.
http://pslmodels.github.io/microdf
MIT License
12 stars 10 forks source link

MicroSeries.gini fails with KeyError if indexes are duplicated #179

Open MaxGhenis opened 3 years ago

MaxGhenis commented 3 years ago

I filed this prematurely, still need to form a MWE but basically I have a MicroDataFrame with duplicated indexes, and calling df.x.gini() causes a KeyError. df.groupby(g).x.gini() works (if indexes aren't duplicated within each g) and mdf.gini(df, "x", "w") also works.

MaxGhenis commented 3 years ago

These both produce this error, suggesting we need to pass the index to the weights throughout generic.py.


d = mdf.MicroDataFrame({"x": [1, 2, 3]}, index=[1, 1, 2], weights=[4, 5, 6])
d = mdf.MicroDataFrame({"x": [1, 2, 3]}, index=[1, 1, 2], weights=pd.Series([4, 5, 6], index=[1, 1, 2]))
``
>KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([0], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"