Sage-Bionetworks / challengescoring

This R package provides scoring mechanisms for computational challenges and implements the bayesBootLadderBoot approach for avoiding test data leakage.
Apache License 2.0
3 stars 0 forks source link

Gustavo's Integrated ROC/PR #19

Open sieberts opened 4 years ago

sieberts commented 4 years ago

Not sure what algorithm for ROC is used, but Gustavo like the integrated AUROC/AUPR. I implemented a version in R that someone could cleanup and add if necessary.

thomasyu888 commented 4 years ago

Thanks @sieberts . Could you provide a link to the code here?

sieberts commented 4 years ago

There's probably a more efficient way to code this, but I did this quickly many years ago. auc_functions.R.zip

thomasyu888 commented 4 years ago

Thanks. I will have @mjrmason chime in here on what ROC we use.

allaway commented 4 years ago

Most recently in ctd chemosensitivity we used limmas auroc function.

On Mon, May 11, 2020, 9:19 AM Thomas Yu notifications@github.com wrote:

Thanks. I will have @mjrmason https://github.com/mjrmason chime in here on what ROC we use.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Sage-Bionetworks/challengescoring/issues/19#issuecomment-626804711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3WNSGNYF3WDFHWBUN52CTRRAQRHANCNFSM4M6A76JA .

sieberts commented 4 years ago

Gustavo's algorithm is pretty slow, but it more accurately estimates the AUC when participants can submit binary OR probability predictions. If participants only generate probability predictions, it is not worth running.

mjrmason commented 4 years ago

Are we talking about integrated or regular AUC? I can't tell from you thread. For integrated, all I remember is you have to be careful since there is a package yields poor/inconsistent results the one would easily use by accident. The Multiple Myeloma Challenge ran into this. Here is my code for the iAUC. I believe the timeROC() function is from a different package is what is problematic. This one is good.

sieberts commented 4 years ago

The code I posted is Gustavo's algorithm for integrated AUC. I'm not sure what's in the repository currently from the pROC package.

allaway commented 4 years ago

Gustavo's algorithm is pretty slow, but it more accurately estimates the AUC when participants can submit binary OR probability predictions. If participants only generate probability predictions, it is not worth running.

@sieberts This is really helpful info to have. Perhaps we can wrap both the function you provided and the limma function in this package and document the scenarios in which a user might want to pick one or the other.

Just to clarify, did you and @mjrmason provide (conceptually) the same iAUC function? Which should we include?

mjrmason commented 4 years ago

I think the code @sieberts provided is for a AUC and prAUC and handles ties better but it is not an integrated AUC. @sieberts please correct me if I am wrong.

sieberts commented 4 years ago

@mjrmason -

Do you have a reference for what you're calling integrated AUC?

mjrmason commented 4 years ago

Hey @sieberts ,

Sorry for all the back and forth on this. Here is the reference. I also attached the pdf.... I think. Let me know if you can't access it and I'll email it.

The term "integrated" is super confusing since naturally one would use integration to find an area under a curve. The "integration" in "integrated AUC" or "iAUC" is referring specifically to survival models or something similar where different time points can be used to call a patient high risk, essentially turning a survival analysis problem it into a classification problem. In these situations you may not be sure if the time point you used is the best and maybe you would want to use 2 months later or earlier for example. The iAUC enables integration across a range of cut off times. It can be though of as averaging across AUCs with different time points used to classify your samples/patients. The R package I use for this is risksetROC and its function IntegrateAUC has a nice example.

As a side not if you where considering using a time range from 0 to the last observed point then the iAUC would just yield the concordance index. So the iAUC can be thought of as a special case of the concordance index narrowed to a specific range of time points.

Note: to use the iAUC you have to have AUC's computed for multiple cut off times. I use timeROC() from the timeROC package for this though you could do it "manually." There is an alternative package called survivalROC referencing the same Heagerty & Zheng paper that could be used for computing AUCs for for multiple cut off times but I found its survivalROC() function to produce very strange results sometimes so I tell people to avoid the package..

Let me know if you want to discuss. Apologies for this insanely long response.

Survival Model Predictive Accu.pdf

sieberts commented 4 years ago

Yes, that's definitely different. Gustavo's is an algorithm to interpolate AUC calculations when there are ties in the submission.