RobinHankin / hyper2

https://robinhankin.github.io/hyper2/
5 stars 3 forks source link

skating_maxp not optimal #183

Open RobinHankin opened 2 years ago

RobinHankin commented 2 years ago

The support for skating_maxp is about -230:

> loglik(skating_maxp,skating)
[1] -229.9844

But I can do better:

    Constrained support maximization

data:  skating
null hypothesis: hughes = kwan = slutskaya
null estimate:
   babiakova   butyrskaya        cohen      fontana      giunchi    gusmeroli 
2.449002e-06 5.055350e-03 8.992062e-02 3.042449e-04 6.760362e-06 5.420574e-05 
       hegel       hubert       hughes     kettunen        kopac         kwan 
5.016270e-06 1.898383e-04 2.913803e-01 1.176779e-03 1.448790e-06 2.913803e-01 
   liashenko         luca  maniachenko        meier         onda     robinson 
4.116941e-04 1.173316e-07 7.175017e-04 2.283563e-04 4.185916e-04 5.381573e-03 
   sebestyen    slutskaya    soldatova       suguri    volchkova 
2.486056e-03 2.913803e-01 7.584265e-06 1.854707e-02 9.437574e-04 
(argmax, constrained optimization)
Support for null:  -226.7399 + K

alternative hypothesis:  sum p_i=1 
alternative estimate:
   babiakova   butyrskaya        cohen      fontana      giunchi    gusmeroli 
4.538041e-06 4.718991e-03 8.965655e-02 3.826161e-04 1.024007e-05 9.192624e-05 
       hegel       hubert       hughes     kettunen        kopac         kwan 
8.849345e-06 2.830198e-04 2.915691e-01 1.435717e-03 2.575329e-06 2.581211e-01 
   liashenko         luca  maniachenko        meier         onda     robinson 
5.459869e-04 1.000000e-06 8.463861e-04 3.061701e-04 5.120301e-04 5.232309e-03 
   sebestyen    slutskaya    soldatova       suguri    volchkova 
2.626953e-03 3.244467e-01 1.165802e-05 1.801935e-02 1.166234e-03 
(argmax, free optimization)
Support for alternative:  -229.3815 + K

degrees of freedom: 2
support difference = -2.641573
p-value: 1 

> 

Above, we see the support for the null is about -226.74, considerably better than the support for skating_maxp at -230.

RobinHankin commented 2 years ago

Actually I am going to retract this. At least partially. I think that the reason that the null has higher support is because of the very small strength of luca, at about 1e-7. The very small value is a numerical "buffer" and it looks like the overall support is a function of the exact value of the strength of luca. And of course the support function has a very large gradient near the edges of the simplex. Will leave this issue open until I can figure out a way to document it. Also see issue #182

RobinHankin commented 2 years ago

For completeness I observe [from skating_table] that luca does not have zero strength as J3 says she beats kopac and J4 says kopac beats soldatova and J6 says soldatova beats gusmeroli and J2 says gusmeroli beats meier and J6 says meier beats kettunen and J4 says kettunen beats butyrskaya and J2 says butyrskaya beats suguri and J1 says suguri beats cohen and J2 says cohen beats kwan and J1 says kwan beats hughes and J1 says hughes beats luca.

RobinHankin commented 2 years ago

Just trying to use the value mentioned above as a start point for maxp():

> (newbest <- maxp(skating,startp=indep(jj$null_estimate) + 1e-5))
   babiakova   butyrskaya        cohen      fontana      giunchi    gusmeroli 
4.330695e-06 4.961783e-03 8.770841e-02 3.351539e-04 9.231904e-06 7.210923e-05 
       hegel       hubert       hughes     kettunen        kopac         kwan 
8.324981e-06 2.447246e-04 3.001592e-01 1.272614e-03 2.609124e-06 2.632570e-01 
   liashenko         luca  maniachenko        meier         onda     robinson 
4.744556e-04 1.000000e-06 7.422141e-04 2.653598e-04 4.463784e-04 4.979261e-03 
   sebestyen    slutskaya    soldatova       suguri    volchkova 
2.422695e-03 3.140329e-01 1.075189e-05 1.754016e-02 1.049213e-03 
> loglik(newbest,skating)
[1] -229.2147
> 

It doesn't do any better than skating_max! I think this is because of the small strength, at 1e-6 of luca. But we should test this with specificp.gt.test(skating,"luca",0.01) and then specificp.gt.test(skating,"luca",0.001) etc.