Current version for scopen
is 0.1.7
scopen
has been test with following OS:
macOS Big Sur (11.4)
Linux (4.18.0)
scopen
has been test with Python 3.6, 3.7, 3.8 and 3.9.
We recommend to use Miniconda to setup
the environment.
numpy (>=1.20.3)
scipy (>=1.6.3)
h5py (>=3.2.1)
pandas (>=1.2.4)
PyTables (>=3.6.1)
matplotlib (>=3.4.2)
scikit-learn (>=0.24.2)
kneed (>=0.7.0)
The easiest way to install scopen and the required packages is using pip
pip install scopen
The installation will take ~20 seconds.
We here describe how to run scopen
.
scopen
performs imputation and dimensionality reduction based on peak by
cell matrix and it allows different input formats. The simplest one is a
text file where each row represent a peak, and
each column is a cell.
Here, we provide an example data in demo
folder, which is a
peak by cell count matrix from human hematopoietic cells.
First uncompress the file:
cd demo
gzip -d TagCount.txt.gz
Execute below command to run scopen:
scopen --input TagCount.txt --input_format dense --output_dir ./ --output_prefix scOpen --output_format dense --verbose 0 --estimate_rank --nc 4
--input_format
: this option specifies the input format as dense for which
a text file is expected
--output_dir
: all output files will be saved in current directory
--output_prefix
: output file name
--verbose
: verbose level
--estimate_rank
: the number of ranks will be automatically selected
--nc
: how many cores will be used
See more information by:
scopen --help
The expected running time is ~18 minutes.
After the command is done, you can find 5 output files in current directory:
scOpen.txt
. An imputed matrix. It has same dimensions as input and can be
used for downstream analysis, such as peak-to-peak co-accessibility prediction.
scOpen_barcodes.txt
. A low-dimension matrix for cells. The number of dimensions is determined by option --estimate_rank
.
It can be used as a dimension reduced matrix for clustering and visualization.
scOpen_peaks.txt
. A low-dimension matrix for peaks.
scOpen_error.pdf
. A line plot showing the model selection process, where x-axis represent ranks (or dimensions),
y-axis is the fitting error of NMF. scOpen selects the best model by identifying a elbow point from this curve.
scOpen_error.txt
. A text file including data for above curve.
As described about, scopen
also supports following input formats.
scOpen is implemented in python, while many popular tools for analysis scATAC-seq, such as
Signac, are developed using R.
If you are dedicated to R, we also provide a tutorial
here to
show you how use scopen
as a dimension reduction method in R to analyze scATAC-seq data
from human peripheral blood mononuclear cells (PBMCs) dataset.
Python is gaining popularity in single-cell data analysis.
Two examples are scanpy (for scRNA-seq) and episcanpy (for single cell epigenomic data, e.g., scATAC-seq).
To ensure scopen
is usable in this context, we provide a jupyter notebook to
show you how to combine scOpen and (epi)scanpy to analysis scATAC-seq data.
For reproducibility, we provide all scripts and data here.