Regression Tests - Githubissues

tristanwietsma commented 10 years ago

Two tests we can implement:

Compare our results to those from ReportEngine
Do a bootstrap using our weighted results and see how well results line up with the target

tristanwietsma commented 10 years ago

I'll grab some decipher data for the first bullet.

brakaus1 commented 10 years ago

got some data from my favorite study 1359 fyi

tristanwietsma commented 10 years ago

For the study Jaime provided, want to add a cleaned up version of the Decipher results & weights into the data submodule?

brakaus1 commented 10 years ago

Yeah I got it

tristanwietsma commented 10 years ago

You'll want to compare a cross tabulation of resampled data with target weights.

tristanwietsma commented 10 years ago

@brakaus1 let's focus checking the marginal distributions between our approach and the RE approach.

The weights aren't unique, so maybe rank order is idea is overly optimistic.

brakaus1 commented 10 years ago

Here are the basic weight proportions without taking cells into account, running it with cells included now. Will also include the bootstrapped proportions

Demographic	Target Demo	Target Value	pandsurvey weight prop.	Report Engine Weight Prop
HispanicRecode	1	0.09	0.09000255707400169	0.08980779506111078
HispanicRecode	2	0.91	0.9099974429259986	0.9101922049388886
RaceRecoder1	0	0.15	0.15000143619437722	0.15021567846151374
RaceRecoder1	1	0.85	0.8499985638056243	0.8497843215384855
Gender	1	0.5	0.5000025351649099	0.5001792283114748
Gender	2	0.5	0.4999974648350946	0.4998207716885224
StudyCellId	2816	0.14285714285714285	0.14233781242986587	0.1430625509157271
StudyCellId	2810	0.14285714285714285	0.14340722205905415	0.1430625509157273
StudyCellId	2811	0.14285714285714285	0.1420660946631863	0.14269640550572815
StudyCellId	2812	0.14285714285714285	0.14285714722813284	0.14306255091572728
StudyCellId	2813	0.14285714285714285	0.1436817543231078	0.14306255091572725
StudyCellId	2814	0.14285714285714285	0.1435430519456496	0.1430625509157273
StudyCellId	2815	0.14285714285714285	0.14210691735100628	0.14199083991563438
AgeRecode	1	0.07	0.06998078700968717	0.06841296200817171
AgeRecode	2	0.22	0.2199407915235169	0.2197490412572322
AgeRecode	3	0.2	0.19994444686937982	0.19955928191545996
AgeRecode	4	0.2	0.1999427778299346	0.19973849930040152
AgeRecode	5	0.21	0.20994115133912788	0.20990984616602848
IncomeRecode	1	0.17	0.16989408700334507	0.17057128907848304
IncomeRecode	2	0.21	0.2098691662982502	0.21045134018936468
IncomeRecode	3	0.25	0.24984424559315488	0.25146614393174194
IncomeRecode	4	0.16	0.15990031717961942	0.1610905400879474
IncomeRecode	5	0.11	0.10993146806098857	0.11070244487787528

tristanwietsma commented 10 years ago

This is great. I think maybe the regression test could just be the sum of squared errors.

like...

assert SSE(pandas) <= SSE(RE)

InContextSolutions / PandaSurvey

Regression Tests #8