Originally posted by **ninhleba** March 20, 2023
Hi,
I've been testing pySCENIC by following this tutorial: http://htmlpreview.github.io/?https://github.com/aertslab/SCENICprotocol/blob/master/notebooks/SCENIC%20Protocol%20-%20Case%20study%20-%20Cancer%20data%20sets.html. The dataset I tried was the first one (Accession ID: GSE115978, Cancer type: SKCM). Everything looked pretty compatible until the AUCell + tSNE projection, which didn't seem to be able to cluster cells by cell type at all, in contrast with the corresponding plot in the tutorial.
![tsne - GSE115978 - AUCell+tSNE](https://user-images.githubusercontent.com/86318293/226442886-ba26333e-542a-4d4f-a50a-ecc171422125.svg)
Previous steps such as grn, ctx and aucell ran smoothly, and the clustermap based on binarized AUCell matrix I obtained also suggests tSNE computed from AUCell scores should be able to cluster out at least half the number of cell types.
![clustermap - GSE115978](https://user-images.githubusercontent.com/86318293/226441981-c97565e8-884c-4872-8f3d-2bad19218937.png)
![legend - GSE115978 - cell_type_colors](https://user-images.githubusercontent.com/86318293/226442106-ede41316-eed1-4b9d-820f-fe374a6c74f4.svg)
![legend - GSE115978 - on_off](https://user-images.githubusercontent.com/86318293/226442111-4ac96225-81a3-492c-9b91-1902dc657a17.svg)
I understand the nature of the algorithm makes the outputs of different runs vary slightly from one another but I don't think they should by this much. I don't think the error stems from tSNE because the PCA + tSNE projection looks like it's supposed to.
![tsne - GSE115978 - PCA+tSNE](https://user-images.githubusercontent.com/86318293/226442677-9ae58a58-8ade-44e2-9ef2-4d2b866478a9.svg)
I would really appreciate it if someone could share their thoughts on this.
Here are the databases that I used:
- Human TFs: https://github.com/aertslab/pySCENIC/blob/master/resources/lambert2018.txt
- Ranking databases: https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/
+ hg19-500bp-upstream-10species.mc9nr.genes_vs_motifs.rankings.feather
+ hg19-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather
+ hg19-tss-centered-10kb-10species.mc9nr.genes_vs_motifs.rankings.feather
- Motifs annotation: https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl
Here is my session info:
```
-----
anndata 0.8.0
cytoolz 0.12.1
matplotlib 3.7.1
numpy 1.23.5
pandas 1.5.3
pyscenic 0.12.1
scanpy 1.9.3
seaborn 0.12.2
session_info 1.0.0
-----
PIL 9.4.0
appnope 0.1.2
asttokens NA
attr 22.1.0
backcall 0.2.0
boltons NA
cffi 1.15.1
cloudpickle 2.2.1
comm 0.1.2
ctxcore 0.2.0
cycler 0.10.0
cython_runtime NA
dask 2023.3.0
dateutil 2.8.2
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
entrypoints 0.4
executing 0.8.3
frozendict 2.3.5
fsspec 2023.3.0
h5py 3.8.0
ipykernel 6.19.2
ipython_genutils 0.2.0
jedi 0.18.1
jinja2 3.1.2
joblib 1.2.0
jupyter_server 1.23.4
kiwisolver 1.4.4
llvmlite 0.39.1
loompy 3.0.7
lxml 4.9.1
markupsafe 2.1.1
matplotlib_inline 0.1.6
mpl_toolkits NA
natsort 8.3.1
networkx 3.0
numba 0.56.4
numexpr 2.8.4
numpy_groupies 0.9.20
openpyxl 3.1.1
packaging 22.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 2.5.2
prompt_toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pure_eval 0.2.2
pyarrow 11.0.0
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.6.0
pydevd_concurrency_analyser NA
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.11.2
pyparsing 3.0.9
pytz 2022.7
scipy 1.10.1
setuptools 65.6.3
six 1.16.0
sklearn 1.2.2
stack_data 0.2.0
statsmodels 0.13.5
tblib 1.7.0
threadpoolctl 3.1.0
tlz 0.12.1
toolz 0.12.0
tornado 6.2
tqdm 4.65.0
traitlets 5.7.1
typing_extensions NA
wcwidth 0.2.5
yaml 6.0
zmq 23.2.0
zoneinfo NA
-----
IPython 8.10.0
jupyter_client 7.4.9
jupyter_core 5.2.0
jupyterlab 3.5.3
notebook 6.5.2
-----
Python 3.10.9 (main, Mar 8 2023, 04:44:36) [Clang 14.0.6 ]
macOS-10.16-x86_64-i386-64bit
-----
Session information updated at 2023-03-20 14:01
```
Discussed in https://github.com/aertslab/SCENIC/discussions/386