Vivianstats / scImpute

Accurate and robust imputation of scRNA-seq data
https://www.nature.com/articles/s41467-018-03405-7
90 stars 34 forks source link

Predict via lasso coefs in case of rank deficiency #2

Closed gokceneraslan closed 6 years ago

gokceneraslan commented 6 years ago

lm() produces NA coefficients if xselect matrix has more cells(=features) than genes(=samples). Although it's maybe unlikely for the real datasets, for simulated datasets of size 200x2000, this leads to excess NAs in the imputed count matrix because of rank deficient OLS fit. So here I used the LASSO fit to make predictions in such cases which fixes the NA issue.

In addition, the predict() function is less error-prone than adding 1 as a new column, getting coefs and then using matrix multiplication. So I used predict() for OLS as well.

gokceneraslan commented 6 years ago

Any questions, comments?

Vivianstats commented 6 years ago

Hello, thanks for your suggestion! Since we have updated the package, it does not rely on LASSO anymore. But we use the predict() function in the new release.