Open ShaowenJ opened 4 years ago
I have just another question might be interesting, So after calculating the RSS, we get a list of rank regulons for each of the clusters. Would it be interested to do a comparison on these lists to find statistically differential regulons? (like assigning a p value) I guess this would be possible by running a nonparametric method like Mann-Whitney U test. Has anyone done this before?
Hi
However, due to the probabilistic nature of the GENIE3/GRNBoost algorithms you will get different results when running pySCENIC several times on the same data set. I strategy to deal with this is to run pySCENIC multiple times and tally the recurrent regulons.
Some remarks on how to interpret AUCell scores: The scores or values provided by SCENIC's last step, AUCell, are enrichment scores and need to be interpreted taking some restrictions into account: (1) You can only compare these unnormalized scores to assess the relative importance of a regulon between different cells (or clusters of cells). Comparing unnormalized scores of different regulons across cells or clusters of cells should not be done. (2) The actual magnitude of the raw scores depends on several factors (including technical ones like auc_threshold). The derive biological insights from you can look at the distribution of the AUCell values of a regulon across all cells (e.g. bimodal distribution indicates the presence of two types of cells in the experiment - on versus off) or compare the average AUCell scores for that regulon between two clusters of cells (and a permutation test can be used to get a p-value for this comparison).
If I interpret your RSS plot correctly, you should investigate the regulons at the right end of your distribution plots for each cluster individually and investigate if they are specific to that cluster.
Hope this helps, Bram
Hi pySCENIC developers,
I tried with pySCENIC roughly, and the results were pretty good and interesting, which confirmed by our Seurat foundings. But I am not very sure how to interpret the results, so I have several questions that hope you can kindly answer.
What's the recommended input count matrix for the pipeline? The raw count matrix, or need some normalization like log2? I tried with both, but got different results. Do you have some suggestions on that? I read the SCENIC paper and it said it preferred gene-summarized counts, but what's that mean. Could you make an example?
What's the AUC value threshold for selecting a good strong and significant regulon. Here is my distribution plots for my clusters. And as you can see, they seem to be very low, the highest ones are around AUC 0.3. Based on my knowledge, that's probably not a very good value.
Thanks very much for your patience and time.