Closed Annika18 closed 4 years ago
Hi @Annika18 ,
Sorry for the delay in response. Yes, the weights for the target genes come from the importance scores from the GRN step. I think that parts of this comment will answer your questions, but feel free to ask if something isn't clear.
The scores are probably best explained in 10.1038/s41596-020-0336-2 (Box 2 within):
Given the pre-calculated whole-genome rankings for a comprehensive motif collection, motif discovery for a given set of genes as input (typically referred to as a gene signature) involves scanning the database for rankings in which the top-ranked fraction is enriched for this input set of genes. More specifically, the cumulative recovery of the foreground set in a whole-genome ranking is quantified using an AUC statistic. The AUC values are standardized (i.e., by mean subtraction and scaling by the standard deviation) and expressed as NESs. Motifs associated with an NES >3.0 are considered as enriched for the supplied signature. This corresponds to a FDR of 3–9% (ref. 13).
I noticed that in the list of regulons produced by df2regulons, you can see a list of genes for each regulon (the target genes), accompanied by a "weight." What do these weights represent? Are they a measure of the importance of the target gene within the GRN, or do they show how strongly a target gene is affected by the given transcription factor? Or something else? I couldn't find documentation of the weight.
Additionally, each regulon has a "score" associated with it-- what does this mean?