This package enables network-based protein activity estimation on Python. It provides also interfaces for scanpy (single-cell RNASeq analysis in Python). Functions are partly transplanted from R package viper and the R package NaRnEA.
The user-friendly documentation is available at: https://alevax.github.io/pyviper/index.html.
scanpy
for single cell pipelinepandas
and anndata
for data computing and storage. numpy
and scipy
for scientific computation.joblib
for parallel computingtqdm
show progress barIf you are using a version of scanpy
<1.9.3, it is also advisable to downgrade pandas
to (>=1.3.0 & <2.0), due to scanpy
incompatibility (issue)
pip install viper-in-python
git clone https://github.com/alevax/pyviper/
cd pyviper
pip install -e .
import pandas as pd
import anndata
import pyviper
# Load sample data
ges = anndata.read_text("test/unit_tests/test_1/test_1_inputs/LNCaPWT_gExpr_GES.tsv").T
# Load network
network = pyviper.load.msigdb_regulon("h")
# Translate sample data from ensembl to gene names
pyviper.pp.translate(ges, desired_format = "human_symbol")
## Filter targets in the interactome
network.filter_targets(ges.var_names)
# Compute regulon activities
## area
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="area")
print(activity.to_df())
## narnea
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="narnea", eset_filter=False)
print(activity.to_df())
The main functions available from pyviper
are:
pyviper.viper
: "pyviper" function for Virtual Inference of Protein Activity by Enriched Regulon Analysis (VIPER). The function allows using 2 enrichment algorithms, aREA and (matrix)-NaRnEA (see below).pyviper.aREA
: computes aREA (analytic rank-based enrichment analysis) and meta-aREApyviper.NaRnEA
: computes matrix-NaRnEA, a vectorized, implementation of NaRnEApyviper.pp.translate
: for translating between species (i.e. mouse vs human) and between ensembl, entrez and gene symbols.pyviper.tl.path_enr
: computes pathway enrichmentOther notable functions include:
pyviper.tl.OncoMatch
: computes OncoMatch, an algorithm to assess the activity conservation of MR proteins between two sets of samples (e.g. validate GEMMs as effective models of human samples)pyviper.pp.stouffer
: computes signatures on a cluster-by-cluster basis using Cluster integration method for pathway enrichmentpyviper.pp.viper_similarity
: computes the similarity between VIPER signaturespyviper.pp.repr_metacells
: compute representative metacells (e.g. for ARACNe) using our method to maximize unique sample usage and minimize resampling (users can specify depth, percent data usage, etc).pyviper.pp.repr_subsample
: select a representative subsample of data using our method to ensure a widely distributed sampling.Additionally, the following submodules are available:
pyviper.load
: submodule containing several utility functions useful for different analyses, including load_msigdb_regulon
, load_TFs
etcpyviper.pl
: submodule containing pyviper-wrappers for scanpy
plottingpyviper.tl
: submodule containing pyviper-wrappers for scanpy
data transformationpyviper.config
: submodule allowing users to specify current species and filepaths for regulatorsLast, a new Interactome
class allows users to load and interrogate ARACNe- and SCENIC-inferred gene regulatory networks.
Please, report any issues that you experience through this repository "Issues".
For any other info or queries please write to Alessandro Vasciaveo (av2729@cumc.columbia.edu)
pyviper
is distributed under a MIT License (see LICENSE).
If you used pyVIPER in your publication, please cite our work here:
Wang, A.L.E., Lin, Z., Zanella, L., Vlahos, L., Girotto, M.A., Zafar, A., ... & Vasciaveo, A. (2024). pyVIPER: A fast and scalable Python package for rank-based enrichment analysis of single-cell RNASeq data. bioRxiv, 2024-08. doi: https://doi.org/10.1101/2024.08.25.609585.
Manuscript in review