Illumina / hap.py

Haplotype VCF comparison tools
Other
406 stars 125 forks source link

Usage with different VCF callers #147

Open stanleyrc opened 3 years ago

stanleyrc commented 3 years ago

I have been trying to run hap.py in order to benchmark using HG002. I have an ensemble of 8 different callers (breakdancer, cnvkit, cnvnator, delly, gridss, lumpy, manta, and wham) but I have not been very successful with any of them. I have been successful in running one of the manta output vcffiles but it only comes back with 5 true positives and 179 fp which seems unlikely. Also, with lumpy I only get 28 TP and and 552 FP. With Gridss, I get a blocksplit it does not run.

Unfortunately, Gridss does not output genotype. Is there a way I can benchmark the gridss vcf using hap.py. I have included the error below with the first header line and vcf record in the vcf file. Has anyone used gridss output successfully with hap.py and what did they do to make it work?

hap.py /data/clarkesr/benchmarking/AshkenazimTrio/HG002_NA24385_son/latest/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz PASSONLY.vcf.gz -f /data/clarkesr/benchmarking/AshkenazimTrio/HG002_NA24385_son/latest/GRCh38/HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed -o HG002_GRIDSS_happy -r /data/clarkesr/ref/genome.fa Hap.py [W] overlapping records at chr6:29747433 for sample 0 [W] Variants that overlap on the reference allele: 5 [I] Total VCF records: 4048342 [I] Non-reference VCF records: 4048342 [I] Total VCF records: 127070 [I] Non-reference VCF records: 0 2021-07-20 11:57:06,692 WARNING Blocksplit returned no blocks. This can happen when an input contains no valid variants. 2021-07-20 11:57:22,016 WARNING No calls for location chr1 in query! ...

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002

chr1 54775 gridss0fb_45o C CTTCTTTC[chr1:54783[ 1578.56 PASS AS=1;ASC=1X;ASQ=621.85;ASRP=1;ASSR=41;BA=0;BANRP=0;BANRPQ=0;BANSR=0;BANSRQ=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm0-167330,asm0-200;BEIDH=0,0;BEIDL=0,0;BMQ=57.91;BMQN=37;BMQX=60;BQ=222.14;BSC=11;BSCQ=222.14;BUM=0;BUMQ=0;BVF=5;CAS=0;CASQ=0;CQ=925.7;EVENT=gridss0fb_45;IC=18;IHOMPOS=0,0;IQ=506.71;MATEID=gridss0fb_45h;MQ=48.45;MQN=25;MQX=60;RAS=1;RASQ=450;REF=96;REFPAIR=38;RP=0;RPQ=0;SB=0.46052632;SC=131M1X;SR=0;SRQ=0;SVLEN=0;SVTYPE=BND;VF=26;SIMPLE_TYPE=INS GT:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:621.85:1:41:0:0:0:0:0:0:0:222.14:11:222.14:0:0:5:0:18:506.71:1578.56:450:96:38:0:0:0:0:26

Thanks for your help!

nate-d-olson commented 3 years ago

hap.py wasn't developed for benchmarking SVs, you might want to try Truvari (https://github.com/spiralgenetics/truvari/wiki) or Svanalyzer (https://svanalyzer.readthedocs.io/en/latest/) as they were developed specifically for benchmarking SVs.