UVC is a very accurate and reasonably fast somatic variant caller. The executable uvc1 in the bin directory takes one BAM file as input and generates one block-gzipped VCF file as output. The script uvcTN.sh in the bin directory takes two BAM files corresponding to tumor and normal as input and generate two block-gzipped VCF files (tumor-variant VCF and normal-filtered VCF) as output.
UVC requires BASH 4.0+ (4.0 is the minimum version required) and a compiler that supports the C++14 standard (g++ 6.3.0 is the minimum version recommended). The Makefile in this directory compiles with g++, but the Makefile can be easily modified to use another compiler instead of g++ (for example, clang). To install from scratch, please run: (./install-dependencies.sh && make clean && make all -j4 && make deploy). Please note that ./install-dependencies.sh requires bzip2 to decompress the downloaded files with the (.tar.bz2) extension. UVC depends on git 2.12+, htslib 1.6+ and bcftools 1.6+ (lower versions of htslib and bcftools may also work, but are not tested). If these two dependencies were already installed, then install-dependencies.sh may not be necessary. For trouble-shooting with the installation of htslib and bcftools, please check their official repositories at https://github.com/samtools/htslib and https://github.com/samtools/bcftools. More specifically, if any error message containing "error while loading shared libraries" pops up, please use the command (./configure --disable-plugins --disable-bz2 --disable-lzma --disable-libcurl --disable-s3 --disable-largefile) to build the corresponding htslib required by UVC first, then build UVC. Although not required, it is highly recommmended that bcftools is installed at a system location (a location that can be found in the PATH environment variable). The UVC binary uses multi-threading efficiently for up to 16 threads. After reaching 16 threads, adding more threads no longer significantly reduces wall-clock runtime. However, more efficient speed-up can still be gained by runing with GNU parallel or qsub to use one job per chromosome.
In total, the installation should take about 5 minutes.
The script uvcTN.sh in the bin directory is used for analyzing tumor-normal pairs. Run uvcTN.sh without any command-line argument will display its usage help. The usage help for uvcTN.sh refers to the executable uvc1, which performs the actual variant calling. The executable uvc1 can perform each of the following tasks:
For UMI (unique molecular identifier, a.k.a. molecular barcode) to be detected, the read name (QNAME) in the input BAM file should be in the format of originalName#UMI. For example, the UMI-labeled read name can be
It is recommended to manually check the following outlier variant candidates if very high precision is required:
If manual check is still too labor-intensive, then it is recommended to keep such outlier variant candidate if the candidate
Otherwise, it is recommended to reject such variant candidate.
In fact, uvc1 and some other executables all generated the same output given the same input. Their differences are as follows.
The diagnostic runtime error messages provided by uvc-2-omp-thread-ubsan.debug and/or uvc-3-ubsan.debug (UBsans) can provide additional debug information to the code maintainer(s). UBsans were not compiled by default due to slow compiling time. If uvc-2-omp-thread-asan.debug and uvc-3-asan.debug did not provide any useful stack-trace, then please compile UBsans and run UBsans instead of Asans to generate some useful error message.
All bug reports, feature requests, and ideas for improvement are welcome (although not all of them may be addressed in time)!
The environment variable ONE_STEP_UMI_STRUCT has special meaning to UVC. Please make sure that ONE_STEP_UMI_STRUCT is either not set (e.g., by using the unset shell command) or set to the empty string before running UVC. The python script extract-barcodes.py is obsolete and is replaced by debarcode. Compared with extract-barcodes.py, debarcode generates equivalent output but consumes only 40% of its runtime. The outputs of these two programs may not be the same in compressed space but are exactly the same in decompressed space. The script bin/uvcnorm.sh can be used for normalizing variants. By default, the normalization generates one SNV record per position and one InDel record per position. The script bin/uvcSurrogateAlign.sh is still under development and should be be used.
For more information, please check the wiki.
Xiaofei Zhao, Allison C Hu, Sizhen Wang, Xiaoyue Wang, Calling small variants using universality with Bayes-factor-adjusted odds ratios, Briefings in Bioinformatics, 2021;, bbab458, https://doi.org/10.1093/bib/bbab458
Calling small variants with universality and Bayesian-frequentist hybridism Xiaofei Zhao, Allison Hu, Sizhen Wang, Xiaoyue Wang bioRxiv 2020.08.23.263749; doi: https://doi.org/10.1101/2020.08.23.263749
[1]赵霄飞, 王思振. 一种基于高通量测序的基因变异检测方法:, CN111243664A[P]. 2020.
x43zhao AT uwaterloo DOT ca (first email to be reached at)
cndfeifei AT gmail DOT com (if the above email does not work)
xiaofei DOT zhao AT genetronhealth DOT com (if the above email does not work)