DIPAN is designed to identify neoantigens originating from intronic polyadenylation (IPA) events detected in tumor transcriptomes. These IPA-derived neoantigens have the potential to be presented by the MHC I molecules.
DIPAN incorporates Python and Bash scripts. To set up the required environment, use the provided environment.yml
file to create a conda virtual environment.
Clone the repository:
git clone https://github.com/YY-TMU/DIPAN.git
Create the environment:
conda env create -f environment.yml
conda activate DIPAN
netMHCpan and OptiType
netMHCpan can only be acquired through the official website and must be added to the environment variable after installation.
OptiType relies on Python 2.7. Due to compatibility issues with other scripts, it should be manually installed according to following instruction.
The following options are avaliable:
We provided the annotation file of IPAFinder for GRCh38, along with normal proteome amino acid sequences in annotated_file
directory, and we recommend that users utilize it directly. The annotation file for GRCh38 of RefSeq can be downloaded from UCSC. It should be noted that the annotation file of IPAFinder must match the GTF file; if the GTF file is changed, the annotation file should be adjusted accordingly for IPAFinder.
DIPAN offers two options for users. If HLA-I typing information is unavailable, users should provide bam_fq_input
, optitype_script
and optitype_config
. OptiType is used to calculate HLA typing. bam_fq_input
should include paths to BAM files and their corresponding FASTQ files. Alternatively, if HLA typing is already known, users should provide bam_hla_input
, which includes paths to BAM files along with their associated HLA typing information.
DIPAN could be tested using recommended files.
DIPAN.sh -a <IPAFinder_anno.txt> -f <bam_fq_input.txt> -n <Normal_proteome.fa> -g <refseq.gtf> -G <genome.fa> -o <output_dir> -optitype_script <OptiTypePipeline.py> -optitype_config <optitype.config>
bam_fq_input.txt contains paths of BAM file and related FASTQ file, as shown below:
Tumor /path/tumor.sorted.bam /path/tumor_1.fq,/path/tumor_2.fq
Normal /path/normal.sorted.bam
DIPAN.sh -a <IPAFinder_anno.txt> -b <bam_hla_input.txt> -n <Normal_proteome.fa> -g <refseq.gtf> -G <genome.fa> -o <output_dir>
bam_hla_input.txt contains paths of BAM file and related HLA typing information, as shown below:
Tumor /path/tumor.sorted.bam HLA-A*01:01,HLA-B*44:02,HLA-C*06:02
Normal /path/normal.sorted.bam
We collected 730 normal samples from TCGA and curated a list of IPA-derived peptides found in these normal samples. This list, along with the normal human proteome provided, can serve as a control when normal RNA-seq data are unavailable. When matched samples are missing, set matched_normal to False.
The final output includes filtered neoantigens from tumor samples, and the output consists of the following columns: Column | Description |
---|---|
SYMBOL | Gene symbol |
Terminal_exon | Genomic location of corresponding terminal exon of IPA isoform |
IPAtype | Type of terminal exon (Skipped or Composite) |
IPUI | Abundance of IPA events |
HLA | HLA-I typing |
Peptide | Amino acid sequence of the potential IPA-derived neoantigens |
%Rank | Rank of the predicted binding score compared to a set of random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities. |