Closed shangguandong1996 closed 3 years ago
Hi Guandong, I am one of Lisa's authors and I can help you with this question.
We use 3k background genes as default mainly because of the efficiency. Lisa would firstly select relevant DNase/H3K27ac samples representing the "chromatin landscape" of the input gene set. Then, for each TF ChIP-seq data(around 7k in total for humans), for every gene in the input and background gene sets, "in silico deletion" is performed to erase the "chromatin landscape" signal on the genes' surrounding peak regions. This is computationally intense. Please check the paper for details.
The default background genes are not randomly selected, actually. There is a list of genes that would be used as background. They are selected in some way so that 1) those genes are relatively consistently active across cell types and 2) those genes are not enriched in any gene ontology. Setting all the rest of the genes as the background is not necessary and would not make much sense to me... The dynamic of gene expression changes are more towards a matter of degree rather than binary. We cannot assume all other genes are not regulated at all.
Hope your question has been addressed well.
Jingyu, thanks for your reply:). This is very helpful for me :) I think I may understand your meaning:
And I am confused about this sentence
The dynamic of gene expression changes are more towards a matter of degree rather than binary
Guandong Shang
For example, when cells are perturbed, differential genes are those genes that are statistically confident(FDR) and demonstrate high-level expression level change(fold change). But it does not mean that all the rest of the genes are constant during the perturbation. They might still be regulated but with less confidence to confirm.
get it :).
Hi, Allen
I noticed the defaule set of num_background_genes is 3000. Why I should set num_background_genes other than selecting the whole gene set. After all, the whole gene set will not slow the speed or use too many memory. However, the random selecing background gene will make the final result un-consistent every time though the final result will not be much different.
Best wishes
Guandong Shang