egr95 / R-codacore

An R package for learning log-ratio biomarkers from high-throughput sequencing data.
Other
21 stars 3 forks source link

Plot Error #19

Open Glfrey opened 1 year ago

Glfrey commented 1 year ago

Hello,

I'm getting the following error when trying to plot my own data (doesn't occur with tutorial data):

> plot(model)
Error in stats::model.frame.default(formula = logRatio ~ y) : 
  variable lengths differ (found for 'y')

Model fitting seemed to go well with 5 log ratios found. Any idea what's up?

egr95 commented 1 year ago

This may be a symptom of #20, where your data are perfectly separable and as a result your "optimal" model is able to push the log-ratio scores of the two classes to +infinity and -infinity, respectively. Is this the example you mentioned with 20 data points? If so, what is the dimensionality (the number of input variables that form your log-ratios)? What does the model output look like when you call print? As an idea of what could be happening, suppose there is a simple log-ratio log(A/B) such that your datapoint is positive whenever log(A/B) > 0 and negative whenever log(A/B) < 0. Then the "optimal" predictive model would be log-odds(y) = slope log(A/B), with slope = infinity. Such a model can be used to make predictions, however plotting slope log(A/B) wouldn't make sense.

I'd also be happy to help debug your particular example and check if there's something else going on here if you are able/willing to share the data.