UCLouvain-CBIO / scp

Single cell proteomics data processing
https://uclouvain-cbio.github.io/scp/index.html
19 stars 2 forks source link

Check for > 1 non-NA values #54

Closed lgatto closed 3 months ago

lgatto commented 5 months ago

If there are 0 or 1 values for a feature, the .dropConstantVariables() produces the following error:

> var(NA) ## only NAs in quant -> empty colData()
[1] NA
> var(1) ## a single value -> single colData() value
[1] NA

We need to add a test for this early on.

cvanderaa commented 3 months ago

Did you run into this error, or do you expect an issue after looking at the code? I'm surprised this would happen since there is this check early on: https://github.com/UCLouvain-CBIO/scp/blob/d89890d5e84c672182fbfd90f0a477a2285f17d4/R/ScpModel-Workflow.R#L213-L217

If there are less than 2 data points, the .adaptModel() function returns an empty model matrix

lgatto commented 3 months ago

No, it is an observed issue, not only based on mental code execution.

Tagging @samgregoire, as I think he hit this issue. The error was gone after doing some initially NA filtering - he initially keep all the data, including features with only 1 non-NA value.

samgregoire commented 3 months ago

This is the error output when I try to run scpModelWorkflow() on my initial data.

  |============================                                                                            |  25%
Error in if ((is.numeric(x[, i]) && var(x[, i]) == 0) || (is.factor(x[, :
missing value where TRUE/FALSE needed

Removing features with only 1 non-NA value was not enough to make it work, I had to perform more stringent NA filtering.

cvanderaa commented 3 months ago

This works on my side:

data("leduc_minimal")
leduc2 <- leduc_minimal
assay(leduc2)[1, ] <- NA
assay(leduc2)[1, 1] <- 1
scpModelWorkflow(leduc2, formula = ~  1 + Channel + SampleType)

I have noticed a bug when you include a numerical variable. Was one of your variables in the model numerical?

samgregoire commented 3 months ago

This is the code I used:

scpModelWorkflow(
  sce, 
  formula = ~ 1 +
    log_medianRI +
    channel +
    set +
    cellType)

channel and cellType are character variables, set and log_medianRI are numerical. I can give you the datasets I used if you want.

cvanderaa commented 3 months ago

Ok then if set is a numerical variable, the commit https://github.com/UCLouvain-CBIO/scp/commit/7998feafc152d59838a289be6b57964b652d15a3 should solve your problem. Could you confirm?

However, I'm guessing that set represents the MS acquisition run. If that's the case, it must not be numerical, but a character (or a factor). Modelling the MS run as numerical implies that you expect your intensities to be 2x higher in set 2 vs set 1, 3x higher in set 3 vs set 1, etc.

cvanderaa commented 3 months ago

A minimal reproducible example:

library("scp")
data("leduc_minimal")
leduc2 <- leduc_minimal
leduc2$Set <- as.numeric(as.factor(leduc2$Set))
scpModelWorkflow(leduc2, formula = ~  1 + Channel + SampleType + Set)
Error in if ((is.numeric(x[, i]) && var(x[, i]) == 0) || (is.factor(x[,  : 
  missing value where TRUE/FALSE needed

fixed by https://github.com/UCLouvain-CBIO/scp/commit/7998feafc152d59838a289be6b57964b652d15a3

lgatto commented 3 months ago

@cvanderaa - could you please bump the version, update the NEWS file and push to Bioc. This needs fix to be included in the coming release.

cvanderaa commented 3 months ago

Done!