Closed lucygarner closed 4 years ago
Dear Lucy,
I presume the 'regulons.csv' file is the table with enriched motifs? If that is the case all information generated by the Python notebook is available to you. From the enriched motif table, you can generate the regulons using a Python notebook and continue from there.
Yes, you should use the '.loom' extension for your output files in the CLI.
Hope this helps.
Kindest regards, Bram
Thank you - yes "regulons.csv" is the output from pyscenic ctx. When you say "generate the regulons using a Python notebook", how would you do this?
Best wishes, Lucy
Hi Bram,
I tried to use the .loom extension in the command line but I got an "unknown file format" error. How do I set the options correctly and which files should be output in .loom format?
Best wishes, Lucy
Dear Lucy,
loom file format is supported (1) as input to the grn step, (2) to supply an expression matrix for the ctx step when using adjacencies as input; and (3) as input and output of the aucell step. WHen using loom format for aucell the final loom file outputted by this step will contain expression matrix and aucell information.
Kindest regards, Bram
Thanks Bram. And how can I generate the regulons from the enriched motif table using a Python notebook, as you suggested above?
Dear Lucy,
Regulons can easily be created from this list of enriched motifs via pyscenic.transform.df2regulons
.
import pickle
from pyscenic.utils import load_motifs
from pyscenic.transform import df2regulons
MOTIFS_FNAME = "<dummy>"
REGULONS_DAT_FNAME = "<dummy>"
df_motifs = load_motifs(MOTIFS_FNAME)
regulons = df2regulons(df_motifs)
# Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f:
pickle.dump(regulons, f)
You can also use the following auxilliary function to carefully select the enriched motifs that contribute to the regulons (for human ranking databases):
import os, glob, re, pickle
from functools import partial
from collections import OrderedDict
import operator as op
from cytoolz import compose
import pandas as pd
import seaborn as sns
import numpy as np
from pyscenic.utils import load_motifs
from pyscenic.transform import df2regulons
def derive_regulons(motifs, db_names=('hg19-tss-centered-10kb-10species',
'hg19-500bp-upstream-10species',
'hg19-tss-centered-5kb-10species')):
motifs.columns = motifs.columns.droplevel(0)
def contains(*elems):
def f(context):
return any(elem in context for elem in elems)
return f
# For the creation of regulons we only keep the 10-species databases and the activating modules. We also remove the
# enriched motifs for the modules that were created using the method 'weight>50.0%' (because these modules are not part
# of the default settings of modules_from_adjacencies anymore.
motifs = motifs[
np.fromiter(map(compose(op.not_, contains('weight>50.0%')), motifs.Context), dtype=np.bool) & \
np.fromiter(map(contains(*db_names), motifs.Context), dtype=np.bool) & \
np.fromiter(map(contains('activating'), motifs.Context), dtype=np.bool)]
# We build regulons only using enriched motifs with a NES of 3.0 or higher; we take only directly annotated TFs or TF annotated
# for an orthologous gene into account; and we only keep regulons with at least 10 genes.
regulons = list(filter(lambda r: len(r) >= 10, df2regulons(motifs[(motifs['NES'] >= 3.0)
& ((motifs['Annotation'] == 'gene is directly annotated')
| (motifs['Annotation'].str.startswith('gene is orthologous to')
& motifs['Annotation'].str.endswith('which is directly annotated for motif')))
])))
# Rename regulons, i.e. remove suffix.
return list(map(lambda r: r.rename(r.transcription_factor), regulons))
For this function to work you need to install the cytoolz
package first (pip install cytoolz
). And this is how to use the function:
MOTIFS_FNAME = "<dummy>"
REGULONS_DAT_FNAME = "<dummy>"
df_motifs = load_motifs(MOTIFS_FNAME)
regulons = derive_regulons(df_motifs)
# Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f:
pickle.dump(regulons, f)
Kindest regards, Bram
Hi Bram,
I got the following error when I tried to use loom format for the output to the final step (AUCell).
pyscenic aucell -o output/auc_mtx.loom --num_workers 12 --cell_id_attribute CellID --gene_attribute Gene data/merged_all.loom output/regulons.csv
Traceback (most recent call last): File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 10, in
sys.exit(main()) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main args.func(args) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 200, in aucell_command append_auc_mtx(args.output.name, auc_mtx, signatures, args.num_workers) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/utils.py", line 275, in append_auc_mtx ds.ca[ATTRIBUTE_NAME_REGULONS_AUC] = create_structure_array(auc_mtx) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/attribute_manager.py", line 129, in setitem return self.setattr(name, val) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/attribute_manager.py", line 149, in setattr values = loompy.normalize_attr_values(val, compare_loom_spec_version(self.ds._file, "3.0.0") >= 0) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/utils.py", line 27, in compare_loom_spec_version vf = int("".join(get_loom_spec_version(f).split("."))) AttributeError: 'numpy.ndarray' object has no attribute 'split'
What does this mean?
Best wishes, Lucy
Dear Lucy,
A quick inspection of the stacktrace makes me believe this might have something to do with the latest version of loompy. A quick test would be to install pyscenic in a new anaconda/miniconda environment [https://docs.conda.io/en/latest/miniconda.html]:
conda create -n pyscenic_test python=3.7
conda activate pyscenic_test
pip install loompy==2.0.17
pip install pyscenic
Hope this helps for now.
Thank you, this seems to be working now.
Hi @lc822 were you able to produce plots like here: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/SCENIC_Running.html
I was able to import to R and everything. However, without scenicObject, I am unable to perform plots and visualization.
Hi @KabitaBaral1,
Unfortunately I was not able to generate these plots from the CLI results. @bramvds, is there a tutorial that could help us with this, or some commands that would convert the CLI results into the same format as the pySCENIC results?
Best, Lucy
Hello @bramvds,
I am running into the same error as Lucy when running AUCell:
AttributeError: 'numpy.ndarray' object has no attribute 'split’
However I am using the Docker image through singularity so cannot update pyscenic or loompy to fix the error. Any advice?
Thank you!
Hi @acjmmartin , @lc822 ,
I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC 0.9.18
was the last version to use the old loompy 2.0.17
), which should be easily accomplished with Docker/Singularity.
To check which loompy version was used to create your file, look for the LOOM_SPEC_VERSION
attribute in ds.attrs
. It is present in loompy 3 files but not those created with loompy 2:
ds = loompy.connect('test.loom', mode='r', validate=False)
If ds.attrs
includes LOOM_SPEC_VERSION
, this is a loompy v3 file:
ds.attrs.keys()
# ['CreationDate', 'LOOM_SPEC_VERSION', 'MetaData', 'last_modified']
and you can print the exact version with:
df.attrs.LOOM_SPEC_VERSION
# '3.0.0'
(don't forget to close the file with ds.close()
).
@cflerin Yes, using pySCENIC 0.9.18 worked!! Thanks so much!!
Great, glad to hear that fixed it!
Hi @acjmmartin , @lc822 ,
I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC
0.9.18
was the last version to use the old loompy2.0.17
), which should be easily accomplished with Docker/Singularity.
Thanks! For me just switching to loompy '2.0.17' worked. I am still using the newer version of pySCENIC '0.10.3'.
The AttributeError: 'numpy.ndarray' object has no attribute 'split'
error continues to be an issue for us as well.
The loom file looks like a v3 file, so could there be another issue not addressed by https://github.com/aertslab/pySCENIC/issues/105#issuecomment-665873648?
loom = lp.connect( f_loom_path_unfilt, mode='r+', validate=False )
loom.attrs.keys()
# ['CreationDate', 'LOOM_SPEC_VERSION']
loom.attrs.LOOM_SPEC_VERSION
# 3.0.0
Thanks for your help! We ♥ the level of support!
Hi @acjmmartin , @lc822 , I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC
0.9.18
was the last version to use the old loompy2.0.17
), which should be easily accomplished with Docker/Singularity.Thanks! For me just switching to loompy '2.0.17' worked. I am still using the newer version of pySCENIC '0.10.3'.
Yes, for me too. Thanks all for a lively discussion and trouble shooting. Just fyi, I installed pyscenic just now using
conda create -y -n pyscenic python=3.7
conda activate pyscenic
conda install -y numpy
conda install -y -c anaconda cytoolz
pip install pyscenic
according to the latest manual (Jun 02, 2021). I still had the described issue.
Now aucell ran after running pip install loompy==2.0.17
.
Hi,
Unfortunately I can't get pySCENIC to work in Python, so I have been using the command line tool.
I have two questions:
I have three output files from the command line version - "merged_all_expr_mat.adjacencies.tsv", "regulons.csv" and "auc_mtx.csv". Do these contain all of the information that you get from the Python pipeline?
Is it possible to output .loom files from the command line version? I would like to import the pySCENIC results into R as discussed in this tutorial: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/importing_pySCENIC.html
This would allow me to produce the plots shown here: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/SCENIC_Running.html
Many thanks, Lucy