Closed ivanawinkler closed 3 years ago
Hi @ivanawinkler ,
There are a few things that could be different here. The first thing is maybe your filter settings on the expression matrix -- it's not clear whether you're using the same input for both methods. The other major one is the GRN method: GENIE3 (R) vs GRNBoost2 (python). Here, I see you set the same seed, but these are different algorithms entirely so I wouldn't expect this to give identical results (although generally they will be very similar). Lastly, you have the parameters for the pruning step (prune2df
or runSCENIC_2_createRegulons
)... did you use the same databases, etc?
That said, it does seem very strange that you only find 4 regulons with pySCENIC. It's normal to see a few of these "Less than 80%" warnings, but if it's really excluding most of the TFs here, it could indicate some problem with your data (proper gene names used?).
Thank you very much for you answer.
I use the same input for both implementations (same expression matrix and same databases) and that is what is puzzling me. Since I am using the same expression matrix and databases, gene names should not be an issue (otherwise I guess R would have same issues). In my expression matrix, I use mouse external gene name and databases I downloaded following SCENIC instructions: dbFiles <- c("https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-500bp-upstream-7species.mc9nr.feather", "https://resources.aertslab.org/cistarget/databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/mm9-tss-centered-10kb-7species.mc9nr.feather")
Hello,
I think I have figured out the issue. As I wrote earlier I used the same count matrix for both analysis, but in rSCENIC I use following filtering steps:
genesKept <- geneFiltering(exprs_matrix, scenicOptions=scenicOptions,
minCountsPerGene=3*.01*ncol(exprs_matrix),
minSamples=ncol(exprs_matrix)*.01)
exprMat_filtered <- exprs_matrix[genesKept, ]
runCorrelation(exprMat_filtered, scenicOptions)
exprMat_filtered <- log2(exprMat_filtered+1)
as recommened in tutorial, while in pySCENIC I just passed unfiltered matrix (same one as the one I start in R).
If I use the aforementioned R code to generate the matrix and the feed it to pySCENIC I get comparable results.
Best,
Ivana
Hello,
thank you for the great tool.
I firstly implemented R version of SCENIC for one of our smaller datasets and we got some very interesting results. So I wanted to analyze our big datasets using pySCENIC since it is much faster in comparison to R version. However, when I tested my python script with my dataset I got a big difference in results between R and python version. R SCENIC in the end determined the activity scores for 381 transcription factors while pySCENIC only determined scores for 4 transcription factors. Looking at the log files, the majority of transcription factors was excluded during the pruning step (with warning message "Less than 80% of the genes in some_gene could be mapped to *. feather".)
I have tried using Docker image of pySCENIC just to exclude that it's an issue with dask version. I got the same result.
I am wondering whether there are some differences in the default settings between R and python version which could explain the difference in the result?
For scRNA-seq standards I have a small dataset of 163 cells of oocytes and granulosa cells.
My R code is:
My python code:
Thank you for your help.
Best regards,
Ivana