YY-TMU / InPACT

A computational method designed to identify and quantify IPA sites via the examination of contextual sequence patterns and RNA-seq reads alignment.
MIT License
3 stars 1 forks source link

InPACT

InPACT is a computational method designed to identify and quantify IPA sites via the examination of contextual sequence patterns and RNA-seq reads alignment. InPACT includes following parts:

Installation

InPACT consists of both Python and Bash scripts. A conda virtual environment can be created using the provided environment.yml file.

  1. Clone the repository:

    git clone https://github.com/YY-TMU/InPACT.git
  2. Create the environment:

    conda env create -f environment.yml
    conda activate InPACT
  3. The installation takes about 5 to 8 minutes. If installation was sucessfull, InPACT command is available:

    InPACT -h

Usage

1. Identify IPA sites

Based on the human reference genome (GRCh38), we provided an annotation of potential IPA sites predicted from the sequence module that could be used directly.

In the following link, annotation file for GRCh38 of RefSeq could be downloaded.

In the following link, test file could be downloaded.

The following options are available in this part:

Command

InPACT -i sample.bam -a RefSeq.gtf -s InPACT_polyAsites.hg38.saf -P 5 

2.Infer novel transcripts

To assemble novel transcripts, a reference genome in FASTA format and a reference gene annotation in GTF format are required.

Command

InPACT_transcript --predict_terminal predict.result.txt --annotated_gtf RefSeq.gtf --fa_path genome.fa --save_gtf merged.gtf

3.Calculate usage of IPA sites

Salmon is used to index and quantify the transcriptome, and then the usage is calculated.

Command

InPACT_quantify --transcript_tpm quant.sf --annotation_file merged.gtf --ipa_info predict.result.txt --save_file ipa_usage.txt

InPACT takes about an hour to run the test file using five cores. The final output format is as follows:

Column Description
Terminal exon Intronic terminal exons for IPA sites
IPA type Type of IPA sites (Skipped or composite)
Gene Gene symbols
Upstream coordinate The 3’ end of the predicted terminal exon’s upstream exon
PolyAsite IPA sites
IPA usage PAU estimate