BDG Standard Error for U test (binary design)

stevenhurwitt commented 5 years ago

Hi,

I'd like to perform a U Test on binary data and compare the given AUC value to a hypothesized proportion. I'm struggling to connect the pieces between the AUC and BDG standard error, and how they relate to the general U statistic (for a small sample size or degrees of freedom).

Essentially I'm trying to determine what distribution the AUC and BDG standard error follow when the sample size is small or BDG df is low. I've looked through some of the papers and am still unclear.

Wikipedia gives formulas for mu and sigma when the sample sizes are large, however even with relatively large sample sizes (for normal approx), the BDG df can be quite low.

Will the normal approximation still work in this case, or is something like the t-distribution appropriate to use with the BDG df? Or perhaps something along the lines of a kappa statistic?

Thanks, Steven Hurwitt

P.S. if the power calculation function is something you want to add to the R package please let me know and I'll be happy to share the code. Likewise if I can figure out anything useful regards to a hypothesis test or kappa statistic.

qigongFDA commented 5 years ago

Hi Steven,

The iMRMC software can analyze binary data with some modifications of input file https://github.com/DIDSR/iMRMC/wiki/MRMC-analysis-of-binary-data

We have not added the "power calculation feature" in the R package. We may plan to add the feature. I will discuss it with Brandon.

Regards, Qi

stevenhurwitt commented 5 years ago

Hi Qi Gong,

Thanks for your response, I have analyzed binary data and came to the following conclusion about my initial question from the java source code:

the auc and auc variance follow the t distribution with dfBDG degrees of freedom when the dfBDG is <50, correct?

meaning the following code would correctly use the results to perform a hypothesis test in R that the proportion is equal to .8? dataset zip file attached.

require(iMRMC)
fake = read.csv("fake88_nurse6_n252_delta15.csv", header = T, sep = ",")
result = doIMRMC(fake)
[fake88_nurse6_n252_delta15.csv.zip](https://github.com/DIDSR/iMRMC/files/3049868/fake88_nurse6_n252_delta15.csv.zip)

auc = result$Ustat$AUCA
auc.var = sum(result$Ustat$varAUCA)
df.BDG.U = result$Ustat$dfBDG

test = sqrt((auc - .8)^2/auc.var)
1 - pt(test, floor(df.BDG.U))

~

additionally, I asked a previous question that you all answered about the power calculation and wanted to share the code in case you decide to implement it in R:

Power.Calc = function(df, SE, eff, alpha){
  require(Rmpfr)
  F.vals = numeric(70)
  ints = seq(0:70) - 1

  t.stat = eff/SE
  lambda = t.stat^2
  cutoff = qf((1-alpha), 1, floor(df))
  d1 = 1
  d2 = df - 1

  s1 = as.bigz((lambda/2)^(ints))
  s2 = factorialZ(ints)
  s3 = exp(-lambda/2)

  q = cutoff*d1/(cutoff*d1+d2)
  betas = pbeta(q, d1/2 + ints, d2/2)

  scale = s1 %/% s2
  F.val = s3*as.numeric(scale)*betas
  power = 1 - sum(F.val)
  print(paste("Power calculated to be: ", round(power, 4)))
  return(power)}

brandon-gallas commented 5 years ago

"the auc and auc variance follow the t distribution with dfBDG degrees of freedom when the dfBDG is <50, correct?" BDG: Correct

I don't think we will add any new calculations for the java code, but we could add it to the R package at some point. Does it not exist somewhere already?

Brandon

brandon-gallas commented 5 years ago

"the auc and auc variance follow the t distribution with dfBDG degrees of freedom when the dfBDG is <50, correct?" BDG: Correct

I don't think we will add any new calculations for the java code, but we could add it to the R package at some point. Does it not exist somewhere already?

Thanks for your feedback.

Brandon

qigongFDA commented 5 years ago

Hi Steven,

Thank you for provide the code.

The function is used in sizing section, but not used in analysis section.

Regards, Qi

stevenhurwitt commented 5 years ago

The R package gives confidence intervals and p-values comparing AUC to .5, just figured it would be more flexible to compare to an arbitrary percent.

The main functionality its missing compared to the Java code is the power calculation. I attached the code for that as an outline in case you all ever want to incorporate sizing a trial/power into the R package.

brandon-gallas commented 5 years ago

Can we do the power/sizing without the other library call? Trying to limit the dependencies in the package.

B

stevenhurwitt commented 5 years ago

yeah i forget what the max factorial is in R but we could just do base factorial() and adjust size of F vector, as well as get rid of bigZ's

DIDSR / iMRMC

BDG Standard Error for U test (binary design) #149