Help with Analysis for Multiple samples, Counts normalization?

Okay, I've found a previous reply to my Q2 by @bramvds here https://github.com/aertslab/pySCENIC/issues/128so please nvm. Quoting below:

Because SCENIC's first step, i.e. network inference using GENIE3/GRNBoost2, relies on tree-based methods there should be no need to transform the gene expression matrix. GENIE3 is based on a "regression per target gene" strategy using a Random Forest (RF) algorithm under the hood to capture non-linear relationships between factor and target. Features do not need to be scaled or transformed for a RF technique to work properly. See also: https://stats.stackexchange.com/questions/58697/when-to-log-exp-your-variables-when-using-random-forest-models . In fact, the GENIE3 tutorial (https://bioconductor.org/packages/release/bioc/vignettes/GENIE3/inst/doc/GENIE3.html) also mentions: "Note that the expression data do not need to be normalised in any way".

However, due to the probabilistic nature of the GENIE3/GRNBoost algorithms you will get different results when running pySCENIC several times on the same data set. I strategy to deal with this is to run pySCENIC multiple times and tally the recurrent regulons.

aertslab / SCENICprotocol

Help with Analysis for Multiple samples, Counts normalization? #58