aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
421 stars 95 forks source link

pySCENIC cannot reproduce AUCell + tSNE projection from a tutorial #388

Open ninhleba opened 1 year ago

ninhleba commented 1 year ago

Discussed in https://github.com/aertslab/SCENIC/discussions/386

Originally posted by **ninhleba** March 20, 2023 Hi, I've been testing pySCENIC by following this tutorial: http://htmlpreview.github.io/?https://github.com/aertslab/SCENICprotocol/blob/master/notebooks/SCENIC%20Protocol%20-%20Case%20study%20-%20Cancer%20data%20sets.html. The dataset I tried was the first one (Accession ID: GSE115978, Cancer type: SKCM). Everything looked pretty compatible until the AUCell + tSNE projection, which didn't seem to be able to cluster cells by cell type at all, in contrast with the corresponding plot in the tutorial. ![tsne - GSE115978 - AUCell+tSNE](https://user-images.githubusercontent.com/86318293/226442886-ba26333e-542a-4d4f-a50a-ecc171422125.svg) Previous steps such as grn, ctx and aucell ran smoothly, and the clustermap based on binarized AUCell matrix I obtained also suggests tSNE computed from AUCell scores should be able to cluster out at least half the number of cell types. ![clustermap - GSE115978](https://user-images.githubusercontent.com/86318293/226441981-c97565e8-884c-4872-8f3d-2bad19218937.png) ![legend - GSE115978 - cell_type_colors](https://user-images.githubusercontent.com/86318293/226442106-ede41316-eed1-4b9d-820f-fe374a6c74f4.svg) ![legend - GSE115978 - on_off](https://user-images.githubusercontent.com/86318293/226442111-4ac96225-81a3-492c-9b91-1902dc657a17.svg) I understand the nature of the algorithm makes the outputs of different runs vary slightly from one another but I don't think they should by this much. I don't think the error stems from tSNE because the PCA + tSNE projection looks like it's supposed to. ![tsne - GSE115978 - PCA+tSNE](https://user-images.githubusercontent.com/86318293/226442677-9ae58a58-8ade-44e2-9ef2-4d2b866478a9.svg) I would really appreciate it if someone could share their thoughts on this. Here are the databases that I used: - Human TFs: https://github.com/aertslab/pySCENIC/blob/master/resources/lambert2018.txt - Ranking databases: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/ + hg19-500bp-upstream-10species.mc9nr.genes_vs_motifs.rankings.feather + hg19-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather + hg19-tss-centered-10kb-10species.mc9nr.genes_vs_motifs.rankings.feather - Motifs annotation: https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl Here is my session info: ``` ----- anndata 0.8.0 cytoolz 0.12.1 matplotlib 3.7.1 numpy 1.23.5 pandas 1.5.3 pyscenic 0.12.1 scanpy 1.9.3 seaborn 0.12.2 session_info 1.0.0 ----- PIL 9.4.0 appnope 0.1.2 asttokens NA attr 22.1.0 backcall 0.2.0 boltons NA cffi 1.15.1 cloudpickle 2.2.1 comm 0.1.2 ctxcore 0.2.0 cycler 0.10.0 cython_runtime NA dask 2023.3.0 dateutil 2.8.2 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 entrypoints 0.4 executing 0.8.3 frozendict 2.3.5 fsspec 2023.3.0 h5py 3.8.0 ipykernel 6.19.2 ipython_genutils 0.2.0 jedi 0.18.1 jinja2 3.1.2 joblib 1.2.0 jupyter_server 1.23.4 kiwisolver 1.4.4 llvmlite 0.39.1 loompy 3.0.7 lxml 4.9.1 markupsafe 2.1.1 matplotlib_inline 0.1.6 mpl_toolkits NA natsort 8.3.1 networkx 3.0 numba 0.56.4 numexpr 2.8.4 numpy_groupies 0.9.20 openpyxl 3.1.1 packaging 22.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 2.5.2 prompt_toolkit 3.0.36 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.2 pyarrow 11.0.0 pycparser 2.21 pydev_ipython NA pydevconsole NA pydevd 2.6.0 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.11.2 pyparsing 3.0.9 pytz 2022.7 scipy 1.10.1 setuptools 65.6.3 six 1.16.0 sklearn 1.2.2 stack_data 0.2.0 statsmodels 0.13.5 tblib 1.7.0 threadpoolctl 3.1.0 tlz 0.12.1 toolz 0.12.0 tornado 6.2 tqdm 4.65.0 traitlets 5.7.1 typing_extensions NA wcwidth 0.2.5 yaml 6.0 zmq 23.2.0 zoneinfo NA ----- IPython 8.10.0 jupyter_client 7.4.9 jupyter_core 5.2.0 jupyterlab 3.5.3 notebook 6.5.2 ----- Python 3.10.9 (main, Mar 8 2023, 04:44:36) [Clang 14.0.6 ] macOS-10.16-x86_64-i386-64bit ----- Session information updated at 2023-03-20 14:01 ```
TMBJ-lab commented 1 year ago

Do you solve this problem now?