InContextSolutions / PandaSurvey

Survey Weighting Methods for the Pandas Dataframe
http://incontextsolutions.github.io/PandaSurvey/
MIT License
2 stars 1 forks source link

Regression Tests #8

Closed tristanwietsma closed 10 years ago

tristanwietsma commented 10 years ago

Two tests we can implement:

tristanwietsma commented 10 years ago

I'll grab some decipher data for the first bullet.

brakaus1 commented 10 years ago

got some data from my favorite study 1359 fyi

tristanwietsma commented 10 years ago

For the study Jaime provided, want to add a cleaned up version of the Decipher results & weights into the data submodule?

brakaus1 commented 10 years ago

Yeah I got it

tristanwietsma commented 10 years ago

You'll want to compare a cross tabulation of resampled data with target weights.

tristanwietsma commented 10 years ago

@brakaus1 let's focus checking the marginal distributions between our approach and the RE approach.

The weights aren't unique, so maybe rank order is idea is overly optimistic.

brakaus1 commented 10 years ago

Here are the basic weight proportions without taking cells into account, running it with cells included now. Will also include the bootstrapped proportions

Demographic Target Demo Target Value pandsurvey weight prop. Report Engine Weight Prop
HispanicRecode 1 0.09 0.09000255707400169 0.08980779506111078
HispanicRecode 2 0.91 0.9099974429259986 0.9101922049388886
RaceRecoder1 0 0.15 0.15000143619437722 0.15021567846151374
RaceRecoder1 1 0.85 0.8499985638056243 0.8497843215384855
Gender 1 0.5 0.5000025351649099 0.5001792283114748
Gender 2 0.5 0.4999974648350946 0.4998207716885224
StudyCellId 2816 0.14285714285714285 0.14233781242986587 0.1430625509157271
StudyCellId 2810 0.14285714285714285 0.14340722205905415 0.1430625509157273
StudyCellId 2811 0.14285714285714285 0.1420660946631863 0.14269640550572815
StudyCellId 2812 0.14285714285714285 0.14285714722813284 0.14306255091572728
StudyCellId 2813 0.14285714285714285 0.1436817543231078 0.14306255091572725
StudyCellId 2814 0.14285714285714285 0.1435430519456496 0.1430625509157273
StudyCellId 2815 0.14285714285714285 0.14210691735100628 0.14199083991563438
AgeRecode 1 0.07 0.06998078700968717 0.06841296200817171
AgeRecode 2 0.22 0.2199407915235169 0.2197490412572322
AgeRecode 3 0.2 0.19994444686937982 0.19955928191545996
AgeRecode 4 0.2 0.1999427778299346 0.19973849930040152
AgeRecode 5 0.21 0.20994115133912788 0.20990984616602848
IncomeRecode 1 0.17 0.16989408700334507 0.17057128907848304
IncomeRecode 2 0.21 0.2098691662982502 0.21045134018936468
IncomeRecode 3 0.25 0.24984424559315488 0.25146614393174194
IncomeRecode 4 0.16 0.15990031717961942 0.1610905400879474
IncomeRecode 5 0.11 0.10993146806098857 0.11070244487787528
tristanwietsma commented 10 years ago

This is great. I think maybe the regression test could just be the sum of squared errors.

like...

assert SSE(pandas) <= SSE(RE)