Shenhav-and-Korem-labs / SCRuB

Other
25 stars 2 forks source link

glmnet error #16

Closed hcoombes closed 1 month ago

hcoombes commented 11 months ago

Hi,

I am trying to use Scrub on a 16S dataset which includes two types of samples faeces and cecum and two types of controls, extraction and pcr controls.

When I run SCRuB I get the following error:

Error in elnet(xd, is.sparse, y, weights, offset, type.gaussian, alpha, : y is constant; gaussian glmnet fails at standardization step

If I run SCRuB with my meta data file with a randomly generated dataset it works fine, so I think the issue must be linked to my ASV table. However I can't figure out what it is - the data is all numeric and stored as a matrix.

This is the meta data file I am using: plate_8_meta.csv

And a reduced version of my dataset (with only the first 50 ASVs included): plate_8_rep.csv

Thanks for your help.

gaustin15 commented 11 months ago

This error usually comes up if you have empty samples, which causes the glmnet issue during the initializations (it would also cause some division by zero issues during SCRuB’s EM update scheme).

In general, this can be resolved by filtering to only samples and controls with reads counts > 0; can you try running the following code on the whole dataset to see if that fixes it?

library(tidyverse)
library(SCRuB)
md <- read.csv('plate_8_meta.csv', row.names=1)
df <- read.csv('plate_8_rep.csv', row.names=1)

# this one will error
# SCRuB(df, md)

# filtering to non-empty samples
inds <- which( df %>% rowSums() > 0 )
scr_out <- SCRuB(df[inds, ], 
                 md[inds, ])

# the following line sums to zero in the 50 ASVs provided; if this isn't empty on the full dataset then SCRuB can be run
df[ row.names( md %>% filter(is_control) ), ] %>% sum()

This code didn’t run on the subset of features you shared because the 9 control samples don’t include any reads across the 50 ASVs provided (and SCRuB needs control samples to run), but maybe there are other ASVs that made it into your controls.

If all the controls are empty across your entire table, then SCRuB generally wouldn’t be useable here (in which case some control-free decontamination methods are likely your best choice); in these cases it would also be good to check the raw read count in the control samples (i.e. fastq files) to make sure there aren’t any processing errors (a lack of merging with DADA-2 could lead to empty controls).

hcoombes commented 11 months ago

Hi,

Thanks that worked! That plate had no reads across all controls which is probably why it failed.

I ran it on another plate of which some of the controls did have reads and then it worked. I did still have to remove the controls of that plate which didn't have any reads to make it run though.

Thanks for your help.

talkorem commented 11 months ago

Thanks for confirming that this is the issue. We will add a more informative error message for empty samples

gaustin15 commented 1 month ago

A more informative error message was addressed in https://github.com/Shenhav-and-Korem-labs/SCRuB/pull/20