Hollenbach-lab / PING

An R-based bioinformatic pipeline to determine killer-cell immunoglobulin-like receptor (KIR) copy number and high-resolution genotypes from short-read sequencing data.
MIT License
8 stars 6 forks source link

PING (Pushing Immunogenetics to the Next Generation)

An R-based bioinformatic pipeline to determine killer-cell immunoglobulin-like receptor (KIR) copy number and high-resolution genotypes from short-read sequencing data.

Data compatibility

Paired-end KIR targeted sequencing data

Setting up pipeline

Download the pipeline code with the following line:

git clone https://github.com/Hollenbach-lab/PING.git
cd ./PING

Setting up container

To ensure a reliable run, we have containerized our image in Singularity (tested on version 3.11.4). You can check whether Singularity has been installed in your system by running singularity --version. Please install Singularity by following their guide here.

Once Singularity has been installed, obtain the image using one of these following commands:

Running PING

Ensure that you are within the PING directory and you can run the entirety of the pipeline using the following command:

singularity exec --bind <fastq_location> ping.sif Rscript PING_run.R 
  --fqDirectory <fastq_location> 
  --resultsDirectory <output_location> 
  --fastqPattern <fastq_pattern> 
  --threads <number_of_threads>

Listed below are the arguments needed to run PING:

Running included test data

We have included 10 test sequences to run through the pipeline, they are located in the test_sequence/ directory. These samples are meant to test that all the installations were done properly. You can run the following code to execute the test:

singularity exec ping.sif Rscript PING_run.R 
  --fqDirectory test_sequence
  --resultsDirectory test_sequence_output 

PING output

Copy number output can be found at [resultsDirectory]/predictedCopyNumberFrame.csv

Genotype output can be found at [resultsDirectory]/finalAlleleCalls.csv

Aligned SNP tables can be found in [resultsDirectory]/alignmentFiles/[sampleID]/iterAlign/

Copy number graphs can be found in [resultsDirectory]/copyPlots/

Unresolved genotypes

If PING is unable to perfectly match aligned SNPs to known KIR allele sequences an unresolved call will be produced.

Unresolved genotype information can be found in [resultsDirectory]/iterAlleleCalls.csv, where the closest allele match is recorded along with the mismatched SNP information in the following format: [closest_matched_allele]$[exon]_[position].[nucleotide]

Where closest matched allele is the allele genotyping that best matches the aligned SNPs, nucleotide denotes the mismatched nucleotide located at the indicated exon and position within the exon. Multiple mismatched SNPs are connected with the ^ symbol.

Troubleshooting

Please save a copy of your terminal output and contact me through Github or email at wesley.marin@ucsf.edu or rayo.suseno@ucsf.edu.

Citations

Please cite:

PING (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008904)

Marin WM, Dandekar R, Augusto DG, Yusufali T, Heyn B, et al. (2021) High-throughput Interpretation of Killer-cell Immunoglobulin-like Receptor Short-read Sequencing Data with PING. PLOS Computational Biology 17(8): e1008904. https://doi.org/10.1371/journal.pcbi.1008904

IPD-KIR (https://www.ebi.ac.uk/ipd/kir/)

Robinson J, Waller MJ, Stoehr P, Marsh SGE. IPD-the Immuno Polymorphism Database. Nucleic Acids Research (2005), 331:D523-526

Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)

Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.