Closed julienvibert closed 4 years ago
Dear Julien,
Could you check that the gene symbols in your expression matrix are unique? Just to make sure that this is not causing your issue. Thanks for your help.
Kindest regards, Bram
Dear @bramvds ,
Thanks for your reply. I just checked, all gene symbols in my expression matrix are unique.
Best, Julien
Hi Julien,
Is there something special with JUN expression across samples/cells? It is discovered by GENIE3/GRNBoost2 as a TF and/or target gene but does not appear in the gene-gene correlation matrix derived from the single-cell expression matrix. This is the cause of the error you get.
Kindest regards, Bram
Hi Bram,
Thanks for your reply. I just checked, there is nothing special about JUN expression, at least as far as I can tell. Would it help if I could send you my two input files (expression matrix and "adjacencies.tsv" from the grn step) so that you can see for yourself?
Best, Julien
Hi Bram,
I have tried the same commands with another dataset to verify that the error wasn't due to this specific matrix. In fact I get exactly the same error, except that this time the problematic TF is not "JUN" but "SELENOP". I notice that "JUN" and "SELENOP" are each time the first entry of the "target" column from the output file adjacencies.tsv from the grn step. Could it be an issue with the structure of the matrix (i.e. naming or indexing of columns)?
Thanks for your help,
Julien
HI Julien,
Just the be sure, could you check your adjacencies file (i.e. the output from the GRN step)? The extension of the file needs to match its format (if fields are separated by commas it needs to be 'csv', if the separator is tab then it should be 'tsv'). Moreover, the file should contain a header as first line:
TF,target,importance
ZNF286B,ZNF286A,210.82799147360737
ZNF286A,ZNF286B,147.96375430051933
Kindest regards, Bram
Hi Bram,
Thanks for your reply. I have just checked, my file is "adjacencies.tsv" and it is correctly tab-separated: head adjacencies.tsv TF target importance MAF SELENOP 183.99765918421974 MAF RNASE1 175.62542977180237
Anyway, I just managed to run the whole pipeline by using the Python tool with Jupyter, so I guess we can leave this problem unsolved especially if I'm the only person to have encountered it..
Thank you again for all your time!
Best,
Julien
I think I've figured out what happened here after running into a similar issue recently. If there are genes present in the network output (adjacencies) that are missing from the gene expression matrix, then this KeyError
will occur. This could happen for instance if some further filtering was done after running GRNBoost2, or if the wrong expression matrix was given in the CTX step.
Hi,
I am trying to run pyscenic ctx from the output of arboreto_with_multiprocessing.py
and I am getting an error that looks related to this one - not sure if it is or not?
My commands are as follows:
python arboreto_with_multiprocessing.py data/merged_all_analysed.loom resources/tfs_list/lambert2018.txt --output results/adjacencies.csv --num_workers 20
pyscenic ctx -o results/reg.csv --annotations_fname resources/motif_annotation/motifs-v9-nr.hgnc-m0.001-o0.0.tbl --num_workers 24 --expression_mtx_fname data/merged_all_analysed.loom --cell_id_attribute CellID --gene_attribute Gene results/adjacencies_arboreto.csv resources/cistarget/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather resources/cistarget/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather
I am using the same input expression matrix for both commands.
The error I get is as follows:
Traceback (most recent call last): File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 8, in
sys.exit(main()) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 421, in main args.func(args) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 140, in prune_targets_command modules = adjacencies2modules(args) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 109, in adjacencies2modules keep_only_activating=(args.all_modules != "yes")) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 268, in modules_from_adjacencies rho_threshold=rho_threshold, mask_dropouts=rho_mask_dropouts) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py", line 132, in add_correlation genes = list(set(adjacencies[COLUMN_NAME_TF]).union(set(adjacencies[COLUMN_NAME_TARGET]))) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in getitem indexer = self.columns.get_loc(key) File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'TF'
Do you have any suggestions as to how to fix this?
Best, Lucy
Conda environment:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_llvm conda-forge
arboreto 0.1.5 pypi_0 pypi
attrs 19.3.0 pypi_0 pypi
bokeh 2.0.1 py37hc8dfbb8_0 conda-forge
boltons 20.1.0 pypi_0 pypi
ca-certificates 2020.4.5.1 hecc5488_0 conda-forge
certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cloudpickle 1.4.1 py_0 conda-forge
cytoolz 0.10.1 py37h516909a_0 conda-forge
dask 1.0.0 py_1 conda-forge
dask-core 1.0.0 py_0 conda-forge
decorator 4.4.2 pypi_0 pypi
dill 0.3.1.1 pypi_0 pypi
distributed 1.28.1 py37_0 conda-forge
freetype 2.10.2 he06d7ca_0 conda-forge
frozendict 1.2 pypi_0 pypi
h5py 2.10.0 pypi_0 pypi
heapdict 1.0.1 py_0 conda-forge
interlap 0.2.6 pypi_0 pypi
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge
joblib 0.15.1 pypi_0 pypi
jpeg 9d h516909a_0 conda-forge
ld_impl_linux-64 2.34 h53a641e_4 conda-forge
libblas 3.8.0 16_openblas conda-forge
libcblas 3.8.0 16_openblas conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgfortran-ng 7.5.0 hdf63c60_6 conda-forge
liblapack 3.8.0 16_openblas conda-forge
libopenblas 0.3.9 h5ec1e0e_0 conda-forge
libpng 1.6.37 hed695b0_1 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
libtiff 4.1.0 hc7e4089_6 conda-forge
libwebp-base 1.1.0 h516909a_3 conda-forge
llvm-openmp 10.0.0 hc9558a2_0 conda-forge
llvmlite 0.32.1 pypi_0 pypi
locket 0.2.0 py_2 conda-forge
loompy 3.0.6 pypi_0 pypi
lz4-c 1.9.2 he1b5a44_1 conda-forge
markupsafe 1.1.1 py37h8f50634_1 conda-forge
msgpack-python 0.6.2 py37hc9558a2_0 conda-forge
multiprocessing-on-dill 3.5.0a4 pypi_0 pypi
ncurses 6.1 hf484d3e_1002 conda-forge
networkx 2.4 pypi_0 pypi
numba 0.49.1 pypi_0 pypi
numpy 1.18.4 py37h8960a57_0 conda-forge
numpy-groupies 0+unknown pypi_0 pypi
olefile 0.46 py_0 conda-forge
openssl 1.1.1g h516909a_0 conda-forge
packaging 20.4 pyh9f0ad1d_0 conda-forge
pandas 0.25.3 py37hb3f55d8_0 conda-forge
partd 1.1.0 py_0 conda-forge
pillow 7.1.2 py37h718be6c_0 conda-forge
pip 20.1.1 py_1 conda-forge
psutil 5.7.0 py37h8f50634_1 conda-forge
pyarrow 0.16.0 pypi_0 pypi
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyscenic 0.10.2 pypi_0 pypi
python 3.7.6 cpython_h8356626_6 conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytz 2020.1 pyh9f0ad1d_0 conda-forge
pyyaml 5.3.1 py37h8f50634_0 conda-forge
readline 8.0 hf8c457e_0 conda-forge
scikit-learn 0.23.1 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
setuptools 47.1.1 py37hc8dfbb8_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
sqlite 3.30.1 hcee41ef_0 conda-forge
tbb 2020.0.133 pypi_0 pypi
tblib 1.6.0 py_0 conda-forge
threadpoolctl 2.1.0 pypi_0 pypi
tk 8.6.10 hed695b0_0 conda-forge
toolz 0.10.0 py_0 conda-forge
tornado 6.0.4 py37h8f50634_1 conda-forge
tqdm 4.46.1 pypi_0 pypi
typing_extensions 3.7.4.2 py_0 conda-forge
umap-learn 0.4.3 pypi_0 pypi
wheel 0.34.2 py_1 conda-forge
xz 5.2.5 h516909a_0 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
zict 2.0.0 py_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.4 h6597ccf_3 conda-forge
@lc822 , what is the header of your results/adjacencies.csv
file? It should be something like this (although this is tab delimited and not comma):
TF target importance
SPI1 TYROBP 58.97375087447331
RPL35 RPS18 58.142358119139345
RPS4X RPL30 57.76453883874825
...
Yes it looks like that.
TF target importance
ZBTB32 IFNG 569.8034320534202
YBX1 RPS2 357.7026283177716
ZBTB32 SEC61G 316.593626747196
ZBTB32 SEC61B 309.9030666846719
...
It appears to be tab delimited.
Hi @cflerin,
Just an observation, without knowing anything about the code implementation (part of a team working alongside @lc822 ), could the error be related to the header being used within the pandas hash function?
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'TF'
The reason I ask is that the error complains that a keyError is not found using "TF", but if I look at the tab file above it seems like TF forms part of the header and not a gene.
Hi @lc822 , @Acribbs ,
Indeed, it seems like pandas is looking for a gene named "TF", which should be part of the header.
Could you try renaming the file to end with .tsv
if it's really tab separated? If your file is actually tab delimited, but named with a .csv
extension this will cause an issue with the file delimiter detection, which is based on file extension.
This seems to be a bug in the arboreto script actually, which always uses tab as a separator, while you've requested the output to be comma-separated, and the ctx
step is looking for commas. I'll make a fix for this.
Hi, I'm trying to run the pyscenic CLI, I have already managed to run the grn step and I got an output "adjacencies.tsv", but when I proceed to the ctx step I have a KeyError. My command is the following one:
pyscenic ctx --mode dask_multiprocessing --annotations_fname $RESOURCES_FOLDER"motifs-v9-nr.hgnc-m0.001-o0.0.tbl" --num_workers 8 --output $DATA_FOLDER"regulons.csv" --expression_mtx_fname $DATA_FOLDER"exp.csv" $DATA_FOLDER"adjacencies.tsv" $DATABASE_FOLDER"hg38refseq-r8010kb_up_and_down_tss.mc9nr.feather" $DATABASE_FOLDER"hg38refseq-r80500bp_up_and_100bp_down_tss.mc9nr.feather"
And the error message I get is this one:
2019-10-25 11:44:51,593 - pyscenic.cli.pyscenic - INFO - Creating modules.
2019-10-25 11:44:53,560 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.
2019-10-25 11:45:02,490 - pyscenic.utils - INFO - Calculating Pearson correlations.
2019-10-25 11:45:02,490 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI. Dropout masking is currently set to [False]. Traceback (most recent call last): File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'JUN'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/julien/anaconda3/bin/pyscenic", line 10, in
sys.exit(main())
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
args.func(args)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 133, in prune_targets_command
modules = adjacencies2modules(args)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 102, in adjacencies2modules
keep_only_activating=(args.all_modules != "yes"))
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 265, in modules_from_adjacencies
rho_threshold=rho_threshold, mask_dropouts=rho_mask_dropouts)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 136, in add_correlation
rhos = np.array([corr_mtx[s2][s1] for s1, s2 in zip(adjacencies.TF, adjacencies.target)])
File "/home/julien/anaconda3/lib/python3.7/site-packages/pyscenic/utils.py", line 136, in
rhos = np.array([corr_mtx[s2][s1] for s1, s2 in zip(adjacencies.TF, adjacencies.target)])
File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in getitem
indexer = self.columns.get_loc(key)
File "/home/julien/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'JUN'
I wondered if it was an issue with the Pandas version (I had 0.23.4), so I upgraded to the latest one (0.25.2) but I still get the same error..
Thank you for your help! Best, Julien