egr95 / R-codacore

An R package for learning log-ratio biomarkers from high-throughput sequencing data.
Other
21 stars 3 forks source link

yHat in incremental fit #14

Closed nick-youngblut closed 2 years ago

nick-youngblut commented 2 years ago

In the codacore docs for Incremental fit, the code is:

[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-2)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-3)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-4)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-5)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-6)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-7)[](https://egr95.github.io/R-codacore/inst/misc/guide.html#cb23-8)dfTest <- HIV[-trainIndex,]
xTest <- x[-trainIndex,]
yTest <- z[-trainIndex]
yHatLogit <- predict(partial, newdata = dfTest) + predict(model, xTest, logits=T)
yHat <- yHatLogit > 0
testAUC <- pROC::auc(pROC::roc(yTest, yHatLogit, quiet=T))
cat("Test AUC:", round(100 * testAUC), "%")
#> Test AUC: 100 %

What is the point of yHat <- yHatLogit > 0, since yHat is not used after this line in the code?

egr95 commented 2 years ago

We added that line simply to illustrate how one might go about obtaining a binary prediction, for instance if one wanted to compute accuracy instead of AUC. I have added a comment to clarify this. The key point in this section is that you cannot just call predict with logits=FALSE for this incremental fit, since that wouldn't take into account the offset from the partial model. So we have to sum up the contributions of both the partial model and the codacore log-ratio in logit space before binarizing.

nick-youngblut commented 2 years ago

Thanks for the clarification!