BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
20 stars 11 forks source link

DESeq2 checkFullRank() fails with no directions where to look #146

Open smoe opened 4 days ago

smoe commented 4 days ago

Hello,

I was just rerunning some older analyses with a different sample sheet, but same sequences, so the reports should be newly generated. The following error surfaced:

...
14/46 [prepare_inputs_import_GTF]     
15/46                                 
16/46 [run_deseq2]                    

Quitting from lines 180-225 [run_deseq2] (deseqReport.Rmd)
Error in `checkFullRank()`:
! the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')
Backtrace:
 1. DESeq2::DESeqDataSetFromMatrix(...)
 2. DESeq2::DESeqDataSet(se, design = design, ignoreRank)
 3. DESeq2:::checkFullRank(modelMatrix)
Execution halted

My hunch is that the sample sheet has issues. And any patches for more insights should likely go to the DESeq2 package. Anyway, if you have ideas - just maybe discuss them here.

Many thanks!

Steffen

borauyar commented 4 days ago

In case you have provided multiple variables e.g. sample groups + some covariates, then at least 2 of those variables must be linear combinations of each other, i.e. one variable can be derived from the other one. So, deseq2 will throw an error in such cases. To troubleshoot, I would first use no covariates, then add them one by one and see which one clashes with the other variables.

I don't think it is a software issue, it must be because of the list of provided covariates.

smoe commented 3 days ago

Likely I removed samples and variations to covariates with them. I agree that it is not a software error, I only perceived it as an inconvenience that I do not have access to the covariate matrix and the error message is a bit, well, short.

In some deeper theory this could be checked early during the sanity checks. Just maybe leave a comment if you would want to accept a pull request towards that direction and someone likely would come up with something.

borauyar commented 2 days ago

Yes, sure, I would be open to a PR as long as the check is identical to what deseq2 does. Probably it should be possible to use the checkFullRank() function without a count matrix at the beginning of the pipeline run.