Filtering out low count genes

In deseq_init.R, this line removes genes with less than 10 counts in all samples. This step has been copied from the deseq2 vignette and aims to reduce memory usage and speed up the computation.

However, I was told that some users create different deseq2 objects from the same count matrix in order to compare two groups of samples. For example, given this design matrix:

sample	condition	genotype
WT_treat1	treated	WT
WT_treat2	treated	WT
WT_treat3	treated	WT
WT_control1	control	WT
WT_control2	control	WT
WT_control3	control	WT
KO_treat1	treated	KO
KO_treat2	treated	KO
KO_treat3	treated	KO
KO_control1	control	KO
KO_control2	control	KO
KO_control3	control	KO

Some users run two analyses to compare treatment vs control within each genotype (instead of modelling a complex design such as ~genotype + condition). As some genes may have low count number within one of the genotypes, the normalized counts will not contain exactly the same genes.

First, I would like to know how common this procedure is. Maybe some regular users can give feedback @ELENAPINEIRO @ralvarez-hub @jlanillos @lserranor @Maria-rfranklin.

As the authors state in the vignette, this is not an essential step. If the above procedure is standard, maybe we can just remove these lines. @SGMartin what do you think?

cnio-bu / cluster_rnaseq

Filtering out low count genes #12