[[https://github.com/Zilong-Li/vcfpp/actions/workflows/linux.yml/badge.svg]] [[https://github.com/Zilong-Li/vcfpp/actions/workflows/mac.yml/badge.svg]] [[https://zilongli.org/proj/vcfpp/index.html][https://img.shields.io/badge/Documentation-latest-blue.svg]] [[https://github.com/Zilong-Li/vcfpp/releases/latest][https://img.shields.io/github/v/release/Zilong-Li/vcfpp.svg]] [[https://img.shields.io/github/license/Zilong-Li/vcfpp?style=plastic.svg]]
This project introduces vcfpp (vcf plus plus), a single C++ file as interface to the basic =htslib=, which can be easily included in a C++ program for scripting high-performance genomic analyses. Check out [[https://github.com/Zilong-Li/vcfppR][vcfppR]] as an example of how vcfpp.h can facilitate rapidly writing high-performance R package.
Features:
single file to be easily included and compiled
easy and safe [[https://zilongli.org/proj/vcfpp/index.html][API]] to use.
objects are RAII. no worry about allocate and free memory.
the full functionalities of =htslib=, e.g. supports of compressed VCF/BCF and URL link as filename.
compatible with C++11 and later
Table of Contents :TOC:QUOTE:
[[#installation][Installation]]
[[#usage][Usage]]
[[#working-with-r][Working with R]]
[[#working-with-python][Working with Python]]
[[#command-line-tools][Command Line Tools]]
Installation
The documentation of API is [[https://zilongli.org/proj/vcfpp/index.html][here.]]
** Reading VCF
In this example, we count the number of heterozygous genotypes for each sample in all records. You can paste the example code into a =example.cpp= file and compile it by =g++ example.cpp -std=c++11 -O3 -Wall -I. -lhts=. You can replace =-I.= with =-I/my/writable/path/= if you put =vcfpp.h= there.
using namespace std;
using namespace vcfpp;
int main(int argc, char argv[])
{
// read data from 1000 genomes server
BcfReader vcf("https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chr21.filtered.SNV_INDEL_SV_phased_panel.vcf.gz");
BcfRecord var(vcf.header); // construct a variant record
vector
** Genotypes Coding
There are 3 types used for genotypes, ie. =vector
*** Code genotypes with missing allele as heterozygous.
If you use =vector
*** Code missing allele as -9
If this default behavior for =vector
** Writing VCF
There are many ways for writing the VCF/BCF file.
*** Use an empty template
Here we construct an initial BCF with header using VCF4.3 specification. Next we add meta data in the header and write out variant record given a string.
BcfWriter bw("out.bcf.gz", "VCF4.3"); bw.header.addFORMAT("GT", "1", "String", "Genotype"); bw.header.addINFO("AF", "A", "Float", "Estimated allele frequency in the range (0,1)"); bw.header.addContig("chr20"); // add chromosome for (auto& s : {"id01", "id02", "id03"}) bw.header.addSample(s); // add 3 samples bw.writeLine("chr20\t2006060\trs146931526\tG\tC\t100\tPASS\tAF=0.000998403\tGT\t1|0\t1|1\t0|0");
*** Use an existing VCF as template
In this example, we first read VCF file =test/test-vcf-read.vcf.gz=. Second, we construct a variant record and update the record with the input VCF. Third, we construct a BcfWriter object using the meta data in the header of the input VCF, writing out the header and the modified variant record.
BcfReader br("test/test.vcf.gz"); BcfWriter bw("out.vcf.gz", br.header); BcfRecord var(br.header); br.getNextVariant(var); bw.writeHeader(); var.setPOS(100001); // update the POS of the variant bw.writeRecord(var);
In some cases where we want to modify the samples of a template VCF, we can associate the =BcfWriter= and =BcfRecord= with another modifiable header instead of the non-modifiable header as the above.
BcfReader br("test/test.vcf.gz"); BcfWriter bw("out.vcf.gz"); // copy header of template vcf and restrict to target sample bw.copyHeader("test/test.vcf.gz", "HG00097"); BcfRecord var(bw.header); br.getNextVariant(var); var.setPOS(100001); bw.writeRecord(var); // output only sample HG00097
** Variants Operation
All variants related API can be found [[https://zilongli.org/proj/vcfpp/classvcfpp_1_1_bcf_record][BcfRecord]]. The commonly used are listed below.
BcfReader vcf("bcf.gz"); // construct a vcf reader
BcfRecord var(vcf.header); // construct an empty variant record associated with vcf header
vcf.getNextVariant(var) // get next variant
vector
** Header Operation
All variants related API can be found in [[https://zilongli.org/proj/vcfpp/classvcfpp_1_1_bcf_header][BcfHeader]].
Examples of vcfpp working with R are in folder [[Rcpp]] and https://github.com/Zilong-Li/vcfppR.
Examples of vcfpp working with Python are in folder [[Pybind11]].
Find more useful command line tools in folder [[tools]].