egr95 / R-codacore

An R package for learning log-ratio biomarkers from high-throughput sequencing data.
Other
21 stars 3 forks source link

ROC error #21

Closed Glfrey closed 1 year ago

Glfrey commented 1 year ago

Hello again @egr95,

I'm still playing around with CoDaCoRe and I've come across a new error when using a smaller subset of my data (dimensions=12, 71), with 8 positive and 4 negative instances when fitting a model and the objective="binary classification":

Error in roc.default(cdbl$y[foldIdx == j], yHat, quiet = TRUE) : 
  'response' must have two levels

Model fitting performs without warning or errors when objective=regression.

egr95 commented 1 year ago

This again looks like a small sample-size problem. In the "discretization" step of the algorithm (see Section 3.3 of the paper), we are computing the ROC curve over different folds of the training data, which in this case will have such few datapoints there is not always representation from both positive and negative instances (both classes need to be represented in order to compute an ROC curve). You could try reducing the number of folds (something like codacore(x, y, cvParams=list(numFolds=3))), but such a model is probably not to be trusted anyway.

I will push some kind of error or warning to clarify the source of this error. Thank you for flagging.

Glfrey commented 1 year ago

Hi @egr95 ,

Thank you as always, that makes sense.