Single cell transcript quantification tool
Scasa is a single cell transcript quantification tool tailored for single cell RNA-Sequencing data. The software comprises pseudo-alignment to quantification steps. See the scasa | wiki for more details on scasa.
If you are using Scasa in your research, please cite:
Lu Pan, Huy Q Dinh, Yudi Pawitan, Trung Nghia Vu, Isoform-level quantification for single-cell RNA sequencing, Bioinformatics, 2021;, btab807, https://doi.org/10.1093/bioinformatics/btab807
Scasa only has Linux version at the moment. The software is already compiled for Linux and installation time needed is less than two minutes. Please follow the instructions below to install scasa:
Dependency packages below are needed to run the quick tutorial below for scasa:
# R packages needed (they will be automatically installed by scasa provided that enough permission is given to install R packages under default R library directory):
library(GenomicFeatures)
library(Biostrings)
library(polyester)
library(foreach)
library(doParallel)
library(data.table)
library(plyr)
After dependency packages are installed:
Download scasa from our Github scasa release scasa and untar the downloaded file:
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.1/scasa_v1.0.1.tar.gz
tar -xzvf scasa_v1.0.1.tar.gz
Add scasa folder to environment variables PATH:
export PATH=$PWD/scasa:$PATH
Now you are ready to use scasa!
After installation, test out scasa by typing scasa --help
in the terminal to see a list of available commands. To see a list of detailed options on scasa, visit our wiki page.
Download our __Test_Dataset (200 cells)__ and unzip it:
wget https://www.dropbox.com/s/gsi8x4fshbn0p11/Test_Dataset.tar.gz
tar xvzf Test_Dataset.tar.gz
Download the cDNA fasta of hg38: refMrna
Enter the following command to kick start the analysis (set a higher number of threads to enable faster processing):
cd Test_Dataset
scasa --fastq Sample_01_S1_L001_R1_001.fastq,Sample_01_S1_L001_R2_001.fastq \
--ref <hg38_ref_file_path> \
--whitelist <test_dataset_whitelist_path> \
--nthreads 4
After you have completed your analysis with scasa, you will see that scasa has generated a project output directory with name <SCASA_project_name_timestamp>
with the following sub-directories:
<SCASA_project_name_timestamp>/
├── LOG/
├── 0PRESETS/
├── 1ALIGN/
└── 2QUANT/
├──<sample_1_quantification_output>
│ └──scasa_isoform_expression.txt
│ └──scasa_gene_expression.txt
└──..
Isoform and gene expression output can be found under the 2QUANT/
directory in the output folder:
cd <SCASA_project_name_timestamp>/2QUANT/<sample_1_quantification_output>/
Now that you have learnt how to run scasa!
In the scasa study, we used the Refseq hg38 for all analysis. This annotation does not contain the transcripts of mitochondria chromosome which are sometimes required for some analyses. Therefore we added the transcripts of the chM from GRCh38 annotation to build a new reference data for scasa.
The annotation can be downloaded here: Annotation data for running scasa (alevin mapper) using refseq hg38 with chM. The version for Homo_sapiens.GRCh38.106 of ENSEMBL can be found in folder Anno. Users can use the annotation files and replace the default reference data in scasa to get results with chM, please see on folder Anno.
It is noted that we have not tested the performances of scasa for other annotation systems such as ENCODE/ENSEMBL.
For those who want to run scasa for new annotations, some scripts in aux folder are helpful to generate Xmatrix and necessary reference data. See more instructions in How-to-run-Scasa-for-a-new-annotation.
On Linux CentOs 7, we tested from thread number 1 to 64 for both small simulated dataset (200 cells, dataset from Step 1, Quick tutorial on scasa) and for a larger simulated dataset (3955 cells) and below are the runtime information for both simulated datasets in terms of hours:
##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH
##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH
##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz
cd ..
##################################################################
# 4. Download test dataset:
##################################################################
wget https://www.dropbox.com/s/gsi8x4fshbn0p11/Test_Dataset.tar.gz
tar xvzf Test_Dataset.tar.gz
cd Test_Dataset
##################################################################
# 5. Run scasa:
##################################################################
scasa --fastq Sample_01_S1_L001_R1_001.fastq,Sample_01_S1_L001_R2_001.fastq \
--ref $refPath \
--whitelist Sample_01_Whitelist.txt \
--nthreads 2 \
--out Scasa_out
##################################################################
# DONE!
##################################################################
Using docker can avoid the issues of the installation of the dependent tools and the enviroment. Users need to revise the docker_params.sh to the paths to input and ouput folders and other scasa parameter setting. The scripts below show an example of running docker for the test data above.
#1) Pull the docker of scasa to use:
sudo docker pull nghiavtr/scasa:v1.0.1
#2) Download test dataset:
wget https://www.dropbox.com/s/gsi8x4fshbn0p11/Test_Dataset.tar.gz
tar xvzf Test_Dataset.tar.gz
#3) Download UCSC hg38 cDNA fasta reference:
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
#4) Download runScasaDocker.sh:
wget https://raw.githubusercontent.com/eudoraleer/scasa/main/docker/runScasaDocker.sh
#5) Download docker_params.sh:
wget https://raw.githubusercontent.com/eudoraleer/scasa/main/docker/docker_params.sh
#6) replace "/path/to/" in the docker_params.sh by the current path
# Please revise the paths in the docker_params.sh following your project
istr="/path/to"
ostr="$PWD"
eval "sed -i -e 's#"$istr"#"$ostr"#g' docker_params.sh"
#7) Run scasa using docker with the parameter settings in docker_params.sh
sudo bash runScasaDocker.sh -param docker_params.sh
##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH
##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH
##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz
cd ..
##################################################################
# 4. Download the CITE-seq RNA samples:
##################################################################
mkdir CiteSeqData
InputDir=$PWD/CiteSeqData
cd CiteSeqData
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR875/003/SRR8758323/SRR8758323_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR875/003/SRR8758323/SRR8758323_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR875/004/SRR8758324/SRR8758324_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR875/004/SRR8758324/SRR8758324_2.fastq.gz
cat SRR8758323_1.fastq.gz SRR8758324_1.fastq.gz > HBMC_Stuart2019_RNA_L001_R1_001.fastq.gz
cat SRR8758323_2.fastq.gz SRR8758324_2.fastq.gz > HBMC_Stuart2019_RNA_L001_R2_001.fastq.gz
rm SRR8758323_1.fastq.gz SRR8758323_2.fastq.gz SRR8758324_1.fastq.gz SRR8758324_2.fastq.gz
##################################################################
# 5. Run scasa:
##################################################################
threadNum=16
scasa --in $InputDir --fastq HBMC_Stuart2019_RNA_L001_R1_001.fastq.gz,HBMC_Stuart2019_RNA_L001_R2_001.fastq.gz --ref $refPath --cellthreshold 35000 --tech 10xv2 --nthreads $threadNum --out Scasa_out
#################################################################
# DONE!
##################################################################