aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

Output .loom files from command line version? #105

Closed lucygarner closed 4 years ago

lucygarner commented 4 years ago

Hi,

Unfortunately I can't get pySCENIC to work in Python, so I have been using the command line tool.

I have two questions:

  1. I have three output files from the command line version - "merged_all_expr_mat.adjacencies.tsv", "regulons.csv" and "auc_mtx.csv". Do these contain all of the information that you get from the Python pipeline?

  2. Is it possible to output .loom files from the command line version? I would like to import the pySCENIC results into R as discussed in this tutorial: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/importing_pySCENIC.html

This would allow me to produce the plots shown here: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/SCENIC_Running.html

Many thanks, Lucy

bramvds commented 4 years ago

Dear Lucy,

  1. I presume the 'regulons.csv' file is the table with enriched motifs? If that is the case all information generated by the Python notebook is available to you. From the enriched motif table, you can generate the regulons using a Python notebook and continue from there.

  2. Yes, you should use the '.loom' extension for your output files in the CLI.

Hope this helps.

Kindest regards, Bram

lucygarner commented 4 years ago

Thank you - yes "regulons.csv" is the output from pyscenic ctx. When you say "generate the regulons using a Python notebook", how would you do this?

Best wishes, Lucy

lucygarner commented 4 years ago

Hi Bram,

I tried to use the .loom extension in the command line but I got an "unknown file format" error. How do I set the options correctly and which files should be output in .loom format?

Best wishes, Lucy

bramvds commented 4 years ago

Dear Lucy,

loom file format is supported (1) as input to the grn step, (2) to supply an expression matrix for the ctx step when using adjacencies as input; and (3) as input and output of the aucell step. WHen using loom format for aucell the final loom file outputted by this step will contain expression matrix and aucell information.

Kindest regards, Bram

lucygarner commented 4 years ago

Thanks Bram. And how can I generate the regulons from the enriched motif table using a Python notebook, as you suggested above?

bramvds commented 4 years ago

Dear Lucy,

Regulons can easily be created from this list of enriched motifs via pyscenic.transform.df2regulons.

import pickle
from pyscenic.utils import load_motifs
from pyscenic.transform import df2regulons

MOTIFS_FNAME = "<dummy>"
REGULONS_DAT_FNAME = "<dummy>"

df_motifs = load_motifs(MOTIFS_FNAME)
regulons = df2regulons(df_motifs)
# Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f:
    pickle.dump(regulons, f)

You can also use the following auxilliary function to carefully select the enriched motifs that contribute to the regulons (for human ranking databases):

import os, glob, re, pickle
from functools import partial
from collections import OrderedDict
import operator as op
from cytoolz import compose

import pandas as pd
import seaborn as sns
import numpy as np
from pyscenic.utils import load_motifs
from pyscenic.transform import df2regulons

def derive_regulons(motifs, db_names=('hg19-tss-centered-10kb-10species', 
                                 'hg19-500bp-upstream-10species', 
                                 'hg19-tss-centered-5kb-10species')):
    motifs.columns = motifs.columns.droplevel(0)

    def contains(*elems):
        def f(context):
            return any(elem in context for elem in elems)
        return f

    # For the creation of regulons we only keep the 10-species databases and the activating modules. We also remove the
    # enriched motifs for the modules that were created using the method 'weight>50.0%' (because these modules are not part
    # of the default settings of modules_from_adjacencies anymore.
    motifs = motifs[
        np.fromiter(map(compose(op.not_, contains('weight>50.0%')), motifs.Context), dtype=np.bool) & \
        np.fromiter(map(contains(*db_names), motifs.Context), dtype=np.bool) & \
        np.fromiter(map(contains('activating'), motifs.Context), dtype=np.bool)]

    # We build regulons only using enriched motifs with a NES of 3.0 or higher; we take only directly annotated TFs or TF annotated
    # for an orthologous gene into account; and we only keep regulons with at least 10 genes.
    regulons = list(filter(lambda r: len(r) >= 10, df2regulons(motifs[(motifs['NES'] >= 3.0) 
                                                                      & ((motifs['Annotation'] == 'gene is directly annotated')
                                                                        | (motifs['Annotation'].str.startswith('gene is orthologous to')
                                                                           & motifs['Annotation'].str.endswith('which is directly annotated for motif')))
                                                                     ])))

    # Rename regulons, i.e. remove suffix.
    return list(map(lambda r: r.rename(r.transcription_factor), regulons))

For this function to work you need to install the cytoolz package first (pip install cytoolz). And this is how to use the function:

MOTIFS_FNAME = "<dummy>"
REGULONS_DAT_FNAME = "<dummy>"

df_motifs = load_motifs(MOTIFS_FNAME)
regulons = derive_regulons(df_motifs)
# Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f:
    pickle.dump(regulons, f)

Kindest regards, Bram

lucygarner commented 4 years ago

Hi Bram,

I got the following error when I tried to use loom format for the output to the final step (AUCell).

pyscenic aucell -o output/auc_mtx.loom --num_workers 12 --cell_id_attribute CellID --gene_attribute Gene data/merged_all.loom output/regulons.csv

Traceback (most recent call last): File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 10, in sys.exit(main()) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main args.func(args) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 200, in aucell_command append_auc_mtx(args.output.name, auc_mtx, signatures, args.num_workers) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/utils.py", line 275, in append_auc_mtx ds.ca[ATTRIBUTE_NAME_REGULONS_AUC] = create_structure_array(auc_mtx) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/attribute_manager.py", line 129, in setitem return self.setattr(name, val) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/attribute_manager.py", line 149, in setattr values = loompy.normalize_attr_values(val, compare_loom_spec_version(self.ds._file, "3.0.0") >= 0) File "/t1-data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.7/site-packages/loompy/utils.py", line 27, in compare_loom_spec_version vf = int("".join(get_loom_spec_version(f).split("."))) AttributeError: 'numpy.ndarray' object has no attribute 'split'

What does this mean?

Best wishes, Lucy

bramvds commented 4 years ago

Dear Lucy,

A quick inspection of the stacktrace makes me believe this might have something to do with the latest version of loompy. A quick test would be to install pyscenic in a new anaconda/miniconda environment [https://docs.conda.io/en/latest/miniconda.html]:

conda create -n pyscenic_test python=3.7
conda activate pyscenic_test
pip install loompy==2.0.17
pip install pyscenic

Hope this helps for now.

lucygarner commented 4 years ago

Thank you, this seems to be working now.

KabitaBaral1 commented 4 years ago

Hi @lc822 were you able to produce plots like here: https://rawcdn.githack.com/aertslab/SCENIC/0a4c96ed8d930edd8868f07428090f9dae264705/inst/doc/SCENIC_Running.html

I was able to import to R and everything. However, without scenicObject, I am unable to perform plots and visualization.

lucygarner commented 4 years ago

Hi @KabitaBaral1,

Unfortunately I was not able to generate these plots from the CLI results. @bramvds, is there a tutorial that could help us with this, or some commands that would convert the CLI results into the same format as the pySCENIC results?

Best, Lucy

acjmmartin commented 4 years ago

Hello @bramvds, I am running into the same error as Lucy when running AUCell: AttributeError: 'numpy.ndarray' object has no attribute 'split’ However I am using the Docker image through singularity so cannot update pyscenic or loompy to fix the error. Any advice?

Thank you!

cflerin commented 4 years ago

Hi @acjmmartin , @lc822 ,

I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC 0.9.18 was the last version to use the old loompy 2.0.17), which should be easily accomplished with Docker/Singularity.

cflerin commented 4 years ago

To check which loompy version was used to create your file, look for the LOOM_SPEC_VERSION attribute in ds.attrs. It is present in loompy 3 files but not those created with loompy 2:

ds = loompy.connect('test.loom', mode='r', validate=False)

If ds.attrs includes LOOM_SPEC_VERSION, this is a loompy v3 file:

ds.attrs.keys()
# ['CreationDate', 'LOOM_SPEC_VERSION', 'MetaData', 'last_modified']

and you can print the exact version with:

df.attrs.LOOM_SPEC_VERSION
# '3.0.0'

(don't forget to close the file with ds.close()).

acjmmartin commented 4 years ago

@cflerin Yes, using pySCENIC 0.9.18 worked!! Thanks so much!!

cflerin commented 4 years ago

Great, glad to hear that fixed it!

Manikgarg commented 4 years ago

Hi @acjmmartin , @lc822 ,

I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC 0.9.18 was the last version to use the old loompy 2.0.17), which should be easily accomplished with Docker/Singularity.

Thanks! For me just switching to loompy '2.0.17' worked. I am still using the newer version of pySCENIC '0.10.3'.

jpcartailler commented 3 years ago

The AttributeError: 'numpy.ndarray' object has no attribute 'split' error continues to be an issue for us as well.

The loom file looks like a v3 file, so could there be another issue not addressed by https://github.com/aertslab/pySCENIC/issues/105#issuecomment-665873648?

loom = lp.connect( f_loom_path_unfilt, mode='r+', validate=False )
loom.attrs.keys()
# ['CreationDate', 'LOOM_SPEC_VERSION']
loom.attrs.LOOM_SPEC_VERSION
# 3.0.0

Thanks for your help! We ♥ the level of support!

Jintram commented 3 years ago

Hi @acjmmartin , @lc822 , I suspect the source of this error is that your loom file was created with loompy v2, and you're now running the AUCell step with loompy v3 (pySCENIC switched to loompy 3). Trying to append v3 data into a v2 loom might cause this. You could try using an older version (pySCENIC 0.9.18 was the last version to use the old loompy 2.0.17), which should be easily accomplished with Docker/Singularity.

Thanks! For me just switching to loompy '2.0.17' worked. I am still using the newer version of pySCENIC '0.10.3'.

Yes, for me too. Thanks all for a lively discussion and trouble shooting. Just fyi, I installed pyscenic just now using

conda create -y -n pyscenic python=3.7
conda activate pyscenic
conda install -y numpy
conda install -y -c anaconda cytoolz
pip install pyscenic

according to the latest manual (Jun 02, 2021). I still had the described issue.

Now aucell ran after running pip install loompy==2.0.17.