MichaelBlin commented 1 year ago

Hi,

we have a question regarding the usage of the R -package. Our experiment is a random study design with 100 scans, and 9 readers, where each scan is assessed by a block of randomly selected 3 out of 9 readers. The outcome variable is a score of scan -quality from 1-100. Paired design because the same 3 readers assess the scans before and after processing the image through a AI system.

We encountered that this package may be of use for our random design with paired A and B test and multiple readers.

We used the sample code from the package as below to test the functionality. The outcome of the uStat11.jointD and uStat11.conditionalD gives us the means and variances and the moments and coefficients.

Our question would now be: How do we get inference for A - B in this example between the scores, we want to compute confidence intervals. As we havent found information about this in the Paper and are unsure how to get from here to the confidence intervals described by Obuchwosi & Rockette and Hillis papers. Any help or links to the papers that can lead us from these estimate of these function to calculation of the proper confidence of the difference of mean scores betweeen A and B modality would be appriciated,

Thanks Michael Blin

$moments c0r0 c1r0 c0r1 c1r1 AB 0.9898495 1.550436 1.067417 2.084773 CD 1.0220998 1.801375 1.041513 2.286694 ABminusCD 1.0262877 1.405781 1.013911 1.662502

$coeff c0r0 c1r0 c0r1 c1r1 AB -0.22 0.02 0.195 0.005 CD -0.22 0.02 0.195 0.005 ABminusCD -0.22 0.02 0.195 0.005

library(iMRMC) simRoeMetz.config <- sim.gRoeMetz.config()

df.MRMC <- sim.gRoeMetz(simRoeMetz.config)

df <- undoIMRMCdf(df.MRMC)

df <- droplevels(df[grepl("pos", df$caseID), ])

result.jointD.identity <- uStat11.jointD( df, kernelFlag = 1, keyColumns = c("readerID", "caseID", "modalityID", "score"), modalitiesToCompare = c("testA", "testB")) cat("\n") cat("uStat11.jointD.identity \n") print(result.jointD.identity[1:10])

brandon-gallas commented 1 year ago

I think you are trying to analyze the mean score given modality A (Abar) minus the mean score given modality B (Bbar).

I ran the code that you posted. The first two elements of the result were the mean and the variance:

print(result.jointD.identity[1:10])
$mean
        A         B   AminusB 
0.8439735 0.6864602 0.1575133 
$var
         A          B    AminusB 
0.01676065 0.02770076 0.04124707

The standard error of the difference is approximately 0.2 = sqrt(0.04)

Assuming your data is approximately normal, these can be used to estimate the lower and upper limits of the confidence interval:

> result.jointD.identity[["mean"]][3] - 1.96 * sqrt(result.jointD.identity[["var"]][3])
   AminusB 
-0.2405505 
> result.jointD.identity[["mean"]][3] + 1.96 * sqrt(result.jointD.identity[["var"]][3])
  AminusB 
0.5555771

I hope this is what you want.

MichaelBlin commented 1 year ago

Thanks for the quick reply Brandon,

I set the set.seed(123) and ran the analysis with the uStat11.jointD and the normal approximated CI's with the calculated variance and got

# normal distributed
result.jointD.identity[["mean"]][3] - 1.96 * sqrt(result.jointD.identity[["var"]][3])
result.jointD.identity[["mean"]][3] + 1.96 * sqrt(result.jointD.identity[["var"]][3])
# [ -0.3137417 ,  0.2992706 ]

Running the analysis with the lmer package as mixed model gives CI's (profile likelihood) which are 1/3 smaller

fit1 <- lmer(score ~ modalityID + readerID + (1 | modalityID:readerID), data = df)
summary(fit1)
confint(fit1)
# modalityIDtestB [-0.1949411 0.20941217]

Using a standard t.test gives

test1<- t.test(score~modalityID, data = df)
test1$conf.int
[1] -0.2115628  0.1970918

taking the t-distribution and adjusting by sqrt(n) gives smaller CIs as expected, the question is now which ones would be possible to take and still be reasonable with it? ( here 5 readers , and 40 observations)

result.jointD.identity[["mean"]][3] - qt(0.975, 5-1) * sqrt(result.jointD.identity[["var"]][3])/sqrt(40)
-0.07588577 
result.jointD.identity[["mean"]][3] + qt(0.975, 5-1) * sqrt(result.jointD.identity[["var"]][3])/sqrt(40)
0.06141476

MichaelBlin commented 1 year ago

Ok I ran several simulations now eith more readers = 10 and more cases n=120, then the results are more consistent for t-test , lmer and the ones one would get with the normal approximations.

Would it be legitimate to calculate the confidence intervals also for a not fully crossed study, for example split-plot design or randomizes design with the same formula?

To be more conservative, one coudl apply t-distributed CIs instead of normal with the number of readers as df-1, or? result.jointD.identity[["mean"]][3] - qt(0.975, n_readers-1) sqrt(result.jointD.identity[["var"]][3]) result.jointD.identity[["mean"]][3] + qt(0.975, n_readers-1) sqrt(result.jointD.identity[["var"]][3])

many thanks KR Michael

brandon-gallas commented 1 year ago

Hi Michael,

I would have to say we are moving into statistical consulting rather than software debugging. If you don't mind, please email me. I'd like to know who I am working with: brandon.gallas@fda.hhs.gov

I think you are on the wrong path. I have concerns about the methods you are comparing and the conclusions you are making. I shared your comments with my colleague, Si Wen (@SiWen314 ), and she concurs. Below I share her responses to my questions about your comments.

BDG

Is the lmer function treating readers and the interaction term as random effects? It seems they are being treated the same as modalityID, which is a fixed effect. This would explain the smaller confidence intervals.

SW

The function fit1 <- lmer(score ~ modalityID + readerID + (1 | modalityID:readerID), data = df) treats the interaction term as a random effect but treats the readers as fixed effects.

However, even if the function is written to treat both readerID and the interaction terms as random effects, like the following: fit2 <- lmer(score ~ modalityID + (1|readerID) + (1 | modalityID:readerID), data = df) The confidence interval for modality effect is still too narrow. modalityIDtestB -0.1961730 0.2106440 This is because the confidence interval for the modality effect returned by lmer is conditional on the readers and the interaction term. In other words, the mean and variance are estimates of these quantities: (E(Abar-Bbar|readerID,modalityID: readerID) Var(Abar-Bbar|readerID,modalityID: readerID)) While the confidence interval computed from the U-stat method accounts for reader and case variability and is based on estimates of these quantities: (E(Abar-Bbar) Var(Abar-Bbar))

Another issue for this lmer function is that it does not account for case variability. The ANOVA model in the OR method only shows the modality, reader and reader x modality effects. However, the dependent variable in OR method is AUC for each reader-modality combination, not the raw score, and the error term in the OR method includes the case variability, includes the correlations, and is managed explicitly. For this reason, the OR method cannot be executed with standard ANOVA software, and the lmer function does not account for case variability.

BDG

Do you understand the closing question? The equation at the end is wrong because the author is dividing by sqrt(40). The variance output from sqrt(result.jointD.identity[["var"]][3]) is already scaled for the number of cases.

SW

I agree with you that it does not make sense to divide the standard error by sqrt(40). Also, it seems that the author try to use the quantile in t distribution, qt(0.975, 5-1), to replace 1.96. It is not reasonable to use the degrees of freedom for the readers, 5-1, as the degrees of freedom for the difference score, because the degrees of freedom for the difference score is a complex function of the degrees of freedom of the readers, cases, and interaction terms.

I think the author try to manipulate the U-stat result to be similar to what is produced from lmer and t.test. However, as I mentioned previously, the result from lmer does not make sense, and the t.test is even worse. The t.test treats the test scores as independent samples, so it ignores all correlations from readers and cases. Neither of the results is correct.

BDG

Could your ANOVA-based software do this analysis better than the U-stat estimates?

https://github.com/DIDSR/ANOVA.MRMC.LOA

SW

Basically, what the author did is calculate the confidence interval of the mean difference, which is also available in iMRMC::laBRBM function in the iMRMC package. The laBRBM function uses uStat11.conditionalD, but the results are the same.

Which is the same as is produced by the U-statistics approach.

My ANOVA-based method give exact the same result

BDG

I need a summary about where my U-stat estimate compares well with yours and where it fails.

SW

When the study is fully crossed, the results from the U-stat method and ANOVA method are the same.

When the study is not fully crossed, but the number of cases read by each reader are the same and the readings are paired across the two modalities, the relative difference in the variance estimation is about 2%

When the study does not keep the number of cases the same across readers, the results from the two methods are very different

MichaelBlin commented 1 year ago

Thanks for the long response Bradon, especially the part about the linear mixed model. That clarifies the main points on our side.

Also I didnt know about the other anova funtions from SW that are provided here. As our study is not fully crossed, but the number of cases read by each reader are the same and the readings are paired across the two modalities, I expect to be on the safe side and use the the normal approximation for the CIs. We will run some simulations with both methods the U-stat method and the ANOVA functions/SW method provided in the iMRMC package and compare the results to see the differences there.

Also thanks for providing the contact email for future consultings.

KR Michael

brandon-gallas commented 1 year ago

We would be happy to hear what your simulations show. Please report back.

We would also like to better know who you are. That is why I shared my email, so you can share your full name and affiliation.

Si has submitted her work that treats arbitrary study designs and is awaiting the journal's response. She might be willing to share that and the corresponding software, though it might depend on the review status.

Brandon

MichaelBlin commented 1 year ago

Thanks Brandon, I will share more details in a direct email contact to you, working for a bigger pharma company. treating arbitrary designs / split-splot or randomized partially crossed design would be really great. For me what is needed a proper Test for the Differences between the two paired modalities for the MCMR study. The design described in the paper from you about the binary outcomes / U-statistic method + SW work on the three way anova is exactly what we need but for not fully crossed design and with arbitrary numerical outcome / scores and so on.

DIDSR / iMRMC

inference with iMRMC - R package - numerical score outcome #174

BDG

SW

BDG

SW

BDG

SW

BDG

SW