DIDSR / iMRMC

iMRMC: Software to do multi-reader multi-case analysis of reader studies
http://didsr.github.io/iMRMC/
Other
22 stars 17 forks source link

Power Analysis in R (for low df) #146

Closed stevenhurwitt closed 5 years ago

stevenhurwitt commented 5 years ago

Hi Dr. Gallas & Qi Gong,

I'm trying to replicate the power analysis portion of the Java applet in R. I've been looking through the source code and am a little confused for the noncentral F test portion.

Do you all have any references to the math that is going on with the iterative Beta cdf and sfactor portion?

This is found in lines 862-878 in the StatTest.java file.

I'm also working with binary data scenarios as mentioned by @DMilikien

Thanks, Steven Hurwitt

stevenhurwitt commented 5 years ago

Trying to calculate it in R as follows:

for (i in 1:171){ j = i - 1 sfactor = ((lambda/2)^j)/factorial(j) F.vals[j] = sfactor*pbeta((cutoff*d1/(cutoff*d1+d2)), d1/2 + j, d2/2)}

power = 1 - exp(-lambda/2)*sum(F.vals)

but the results are a little off from the Java applet.

qigongFDA commented 5 years ago

Hello Steven,

Thank you for using our iMRMC software, and sorry for the late reply due to the shut down.

  1. 1E-300 is the smallest gap between floating number in java. https://lingpipe-blog.com/2013/12/19/limits-of-floating-point-and-the-asymmetry-of-0-and-1/.

  2. I believe your R code is correct, it has the same idea as our JAVA code.

  3. What is exact number for the "a little off"? I think the difference might come from following reasons: a. Iteration times (I think this may be the main reason): Our _Iteration will stop when tempF is in [-(1E-300),(1E-300)]. However, your iteration will run 170 times.To check this, you could check how many numbers in your F array are located inside [-(1E-300),(1E-300)]. If there are many numbers or 0 number found, the Iteration times might be the reason. b. Java and R function difference:Java BetaDist.cdf function and R pbeta function might not generate exact same result. Based on my test, the difference is very small about 1E-7. c. Input parameter difference: Please double check your parameter d1, d2, lambda, cutoff are same as our cdfNonCentralF inputs.

If you want to have some deeper analyses about the issue, please provide more information (eg. input file or data) so that we can better understand and replicate the issue.

Qi

stevenhurwitt commented 5 years ago

Thanks for the response Qi, I've adjusted my code to try and overcome some problems with numerical precision in R. However I'm thinking it's probably me doing something dumb with the inputs or cutoff values.

"A little off" here translates to more than just a little off. The power calculation from the Java applet yields .91 whereas the R code yields .96

Example data and code are below. The file "fake pilot.csv" is used in R, while "fake pilot 2.csv" is used in the Java applet.

require(iMRMC)
require(Rmpfr)
setwd('/Users/stevenhurwitt/Documents/R programming/MRMC')

data = read.csv("fake pilot.csv", header = T, sep = ",")
result = doIMRMC(data)

var.est = as.matrix(result$varDecomp$BDG$Ustat$comp$modalityA.modalityB) %*% 
t(as.matrix(result$varDecomp$BDG$Ustat$coeff$modalityA.modalityB))
SE.tot = sqrt(var.est2[1,1] + var.est2[2,1] - 2*var.est2[3,1]) #var A - B

#SE.tot = sqrt(as.numeric(result$Ustat$varAUCAminusAUCB[3]))
eff.size = .28
t.stat = eff.size/SE.tot
lambda = t.stat^2
cutoff = qf(.95, 1, dfBDG)

d1 = 1
d2 = dfBDG - 1

ints = seq(0:214) - 1

s1 = (lambda/2)^(ints)
s2 = factorialZ(ints)
s3 = exp(-lambda/2)

q = cutoff*d1/(cutoff*d1+d2)
betas = pbeta(q, d1/2 + ints, d2/2)

scale = s1 %/% s2
F.val = s3*as.numeric(scale)*betas
power = 1 - sum(F.val)

fake pilots.zip

qigongFDA commented 5 years ago

Hi Steven,

Thank you for providing the code and input files.

I find the reason. The difference is from parameter cutoff.

  1. In R, you use qf to calculate cutoff. The dfBDG is a floating number. Based on your input file: dfBDG = 2.4. cutoff = 13.5682
  2. In java, we use the following functions FisherFDist fdist = new FisherFDist(1, (int) df); cutoff = fdist.inverseF(1 - sigLevel); where we transfer df in to integer. Thus the input of FisherFDist is (1, 2). cutoff = 18.5128

If you edit the R code to use integer dfBDG to calculate the cutoff, you can get very close powers.

Regards, Qi

stevenhurwitt commented 5 years ago

Oh yeah, I knew it was some small dumb error on my part - thanks so much for this!

brandon-gallas commented 5 years ago

Not dumb. It is perfectly reasonable to interpolate between the integers. We were lazy and rounded down. It must be our conservative bias as regulators. Thanks for using our software.

qigongFDA commented 5 years ago

In java, we use FisherFDist to generate F distribution. The function asks for integer as input. Thus, we have to round the degree of freedom. https://www.iro.umontreal.ca/~simardr/ssj/doc/html/umontreal/iro/lecuyer/probdist/FisherFDist.html

Maybe we can use FDistribution to replace FisherFDist. FDistribution allow the inputs as double format. http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/distribution/FDistribution.html

We may change it in the next version.