RobinHankin / hyper2

https://robinhankin.github.io/hyper2/
5 stars 3 forks source link

infelicity in samep.test() #182

Open RobinHankin opened 2 years ago

RobinHankin commented 2 years ago

Look:

> samep.test(skating,c("hughes","kwan","slutskaya"))

    Constrained support maximization

data:  skating
null hypothesis: hughes = kwan = slutskaya
null estimate:
  babiakova  butyrskaya       cohen     fontana     giunchi   gusmeroli       hegel      hubert      hughes 
0.001170404 0.080744661 0.262561587 0.013678697 0.001746329 0.005277600 0.001708256 0.010569093 0.066352749 
   kettunen       kopac        kwan   liashenko        luca maniachenko       meier        onda    robinson 
0.033427452 0.001000040 0.066352749 0.016608713 0.001000638 0.022614170 0.011244769 0.015953095 0.079423388 
  sebestyen   slutskaya   soldatova      suguri   volchkova 
0.048987694 0.066352749 0.002027617 0.160943647 0.030253902 
(argmax, constrained optimization)
Support for null:  -309.5204 + K

alternative hypothesis:  sum p_i=1 
alternative estimate:
   babiakova   butyrskaya        cohen      fontana      giunchi    gusmeroli        hegel       hubert 
4.819266e-06 5.798746e-03 8.452657e-02 4.529396e-04 9.293266e-06 1.052171e-04 1.200891e-05 3.562842e-04 
      hughes     kettunen        kopac         kwan    liashenko         luca  maniachenko        meier 
2.969323e-01 1.647078e-03 3.327016e-06 2.635371e-01 6.439533e-04 1.000000e-06 1.008173e-03 3.669195e-04 
        onda     robinson    sebestyen    slutskaya    soldatova       suguri    volchkova 
5.949065e-04 5.707179e-03 3.329658e-03 3.156011e-01 1.332146e-05 1.793490e-02 1.413205e-03 
(argmax, free optimization)
Support for alternative:  -229.8352 + K

degrees of freedom: 2
support difference = 79.68519
p-value: 2.472647e-35 

Above we see an absurdly low p-value, given that the three medallists (hughes, kwan, slutskaya) have very similar MLBT strengths at about 30%, 26% and 32% respectively. The problem is in the optimization of the likelihood for the null, in which the optimization has to respect the constraint hughes = kwan = slutskaya. One idea might be to furnish samep.test() with a starting point for the optimization which in this case would presumably be close to the result of the free optimization.

RobinHankin commented 1 year ago

Tried it today:

suppressMessages(library("hyper2"))
samep.test(skating,c("hughes","kwan","slutskaya"))
#> Error in constrOptim(theta = startq, f = objective, grad = NULL, ui = UI, : initial value is not in the interior of the feasible region

Created on 2023-06-26 with reprex v2.0.2

RobinHankin commented 1 year ago

Tried debugging samep.test() with the following result, cut-and-pasted just before the error message given by

constrained_opt <- constrOptim(theta = startq, f = objective, grad = NULL, ui = UI, ci = CI, ...)

Browse[2]> dput(startq)
c(0.282091812545044, 8.29788065801177e-06, 0.0106041284117271, 
0.0924820877512254, 0.000766645872519253, 1.52619565708673e-05, 
0.000153632462222593, 1.47510996256045e-05, 0.000540730804395944, 
0.00278626450583833, 5.61216837437437e-06, 0.000913980140539977, 
9.99999332653507e-07, 0.0013611385930829, 0.000661050382393342, 
0.000820213577549596, 0.00955008375713159, 0.0051423013299458, 
2.05140413391927e-05, 0.02559701568729)
Browse[2]> UI %*% startq - CI
               [,1]
 [1,]  2.820908e-01
 [2,]  7.297881e-06
 [3,]  1.060313e-02
 [4,]  9.248109e-02
 [5,]  7.656459e-04
 [6,]  1.426196e-05
 [7,]  1.526325e-04
 [8,]  1.375110e-05
 [9,]  5.397308e-04
[10,]  2.785265e-03
[11,]  4.612168e-06
[12,]  9.129801e-04
[13,] -6.673465e-13
[14,]  1.360139e-03
[15,]  6.600504e-04
[16,]  8.192136e-04
[17,]  9.549084e-03
[18,]  5.141301e-03
[19,]  1.951404e-05
[20,]  2.559602e-02
[21,]  2.278852e-03
Browse[2]> 

We see that element 13 is the problem, falling just outside the constraint, clearly caused by numerical accuracy problems: startq[13] is a tiny bit less than 1e-6