Closed tristanwietsma closed 10 years ago
I'll grab some decipher data for the first bullet.
got some data from my favorite study 1359 fyi
For the study Jaime provided, want to add a cleaned up version of the Decipher results & weights into the data submodule?
Yeah I got it
You'll want to compare a cross tabulation of resampled data with target weights.
@brakaus1 let's focus checking the marginal distributions between our approach and the RE approach.
The weights aren't unique, so maybe rank order is idea is overly optimistic.
Here are the basic weight proportions without taking cells into account, running it with cells included now. Will also include the bootstrapped proportions
Demographic | Target Demo | Target Value | pandsurvey weight prop. | Report Engine Weight Prop |
---|---|---|---|---|
HispanicRecode | 1 | 0.09 | 0.09000255707400169 | 0.08980779506111078 |
HispanicRecode | 2 | 0.91 | 0.9099974429259986 | 0.9101922049388886 |
RaceRecoder1 | 0 | 0.15 | 0.15000143619437722 | 0.15021567846151374 |
RaceRecoder1 | 1 | 0.85 | 0.8499985638056243 | 0.8497843215384855 |
Gender | 1 | 0.5 | 0.5000025351649099 | 0.5001792283114748 |
Gender | 2 | 0.5 | 0.4999974648350946 | 0.4998207716885224 |
StudyCellId | 2816 | 0.14285714285714285 | 0.14233781242986587 | 0.1430625509157271 |
StudyCellId | 2810 | 0.14285714285714285 | 0.14340722205905415 | 0.1430625509157273 |
StudyCellId | 2811 | 0.14285714285714285 | 0.1420660946631863 | 0.14269640550572815 |
StudyCellId | 2812 | 0.14285714285714285 | 0.14285714722813284 | 0.14306255091572728 |
StudyCellId | 2813 | 0.14285714285714285 | 0.1436817543231078 | 0.14306255091572725 |
StudyCellId | 2814 | 0.14285714285714285 | 0.1435430519456496 | 0.1430625509157273 |
StudyCellId | 2815 | 0.14285714285714285 | 0.14210691735100628 | 0.14199083991563438 |
AgeRecode | 1 | 0.07 | 0.06998078700968717 | 0.06841296200817171 |
AgeRecode | 2 | 0.22 | 0.2199407915235169 | 0.2197490412572322 |
AgeRecode | 3 | 0.2 | 0.19994444686937982 | 0.19955928191545996 |
AgeRecode | 4 | 0.2 | 0.1999427778299346 | 0.19973849930040152 |
AgeRecode | 5 | 0.21 | 0.20994115133912788 | 0.20990984616602848 |
IncomeRecode | 1 | 0.17 | 0.16989408700334507 | 0.17057128907848304 |
IncomeRecode | 2 | 0.21 | 0.2098691662982502 | 0.21045134018936468 |
IncomeRecode | 3 | 0.25 | 0.24984424559315488 | 0.25146614393174194 |
IncomeRecode | 4 | 0.16 | 0.15990031717961942 | 0.1610905400879474 |
IncomeRecode | 5 | 0.11 | 0.10993146806098857 | 0.11070244487787528 |
This is great. I think maybe the regression test could just be the sum of squared errors.
like...
assert SSE(pandas) <= SSE(RE)
Two tests we can implement: