Illumina / hap.py

Haplotype VCF comparison tools
Other
399 stars 120 forks source link

scmp-distance engine #61

Open virenar opened 5 years ago

virenar commented 5 years ago

I am trying the local counting based on distance match using the scmp-distance engine but getting following error

USAGE

/opt/hap.py/bin/hap.py tmp/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf.vcf.gz tmp/S1.genome.vcf.gz -R tmp/target_regions_merge.bed -o tmp/hap_test5 -r tmp/hg19.fa --engine scmp-distance 

ERROR

[I] Total VCF records:         3775119
[I] Non-reference VCF records: 3775119
[I] Total VCF records:         13317
[I] Non-reference VCF records: 361
2018-10-01 17:45:41,260 ERROR    Exception when running scmp: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output:  / Warning: trying to combine "GQ" tag definitions of different types
Warning: trying to combine "AD" tag definitions of different lengths
Incorrect number of AD fields (2) at chr1:21889635, cannot merge.

2018-10-01 17:45:41,261 ERROR    ------------------------------------------------------------
2018-10-01 17:45:41,264 ERROR    Traceback (most recent call last):
2018-10-01 17:45:41,264 ERROR      File "/opt/hap.py/lib/python27/Haplo/scmp.py", line 46, in runSCmp
2018-10-01 17:45:41,266 ERROR        runBcftools(*vargs)
2018-10-01 17:45:41,266 ERROR      File "/opt/hap.py/lib/python27/Tools/bcftools.py", line 50, in runBcftools
2018-10-01 17:45:41,267 ERROR        ". Return code was %i, output: %s / %s \n" % (rc, o, e))
2018-10-01 17:45:41,267 ERROR    Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output:  / Warning: trying to combine "GQ" tag definitions of different typesWarning: trying to combine "AD" tag definitions of different lengthsIncorrect number of AD fields (2) at chr1:21889635, cannot merge. 
2018-10-01 17:45:41,267 ERROR    ------------------------------------------------------------
2018-10-01 17:45:41,273 ERROR    Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output:  / Warning: trying to combine "GQ" tag definitions of different types
Warning: trying to combine "AD" tag definitions of different lengths
Incorrect number of AD fields (2) at chr1:21889635, cannot merge.

2018-10-01 17:45:41,273 ERROR    Traceback (most recent call last):
2018-10-01 17:45:41,273 ERROR      File "/opt/hap.py/bin/hap.py", line 511, in <module>
2018-10-01 17:45:41,274 ERROR        main()
2018-10-01 17:45:41,274 ERROR      File "/opt/hap.py/bin/hap.py", line 459, in main
2018-10-01 17:45:41,274 ERROR        tempfiles += Haplo.scmp.runSCmp(args.vcf1, args.vcf2, output_name, args)
2018-10-01 17:45:41,274 ERROR      File "/opt/hap.py/lib/python27/Haplo/scmp.py", line 46, in runSCmp
2018-10-01 17:45:41,274 ERROR        runBcftools(*vargs)
2018-10-01 17:45:41,274 ERROR      File "/opt/hap.py/lib/python27/Tools/bcftools.py", line 50, in runBcftools
2018-10-01 17:45:41,274 ERROR        ". Return code was %i, output: %s / %s \n" % (rc, o, e))
2018-10-01 17:45:41,275 ERROR    Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output:  / Warning: trying to combine "GQ" tag definitions of different typesWarning: trying to combine "AD" tag definitions of different lengthsIncorrect number of AD fields (2) at chr1:21889635, cannot merge. 

I even tried removing the AD from VCF but the problem still persists.

I tried merging the VCFs myself. It gave me warnings but was able to merge the files

root@4b74e02afe41:/# /opt/hap.py/bin/bcftools merge tmp/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf.vcf.gz tmp/S1.genome.vcf.gz -O v -o m.vcf
Warning: trying to combine "AD" tag definitions of different lengths
Warning: trying to combine "GQ" tag definitions of different types
Yiming-Shen commented 5 years ago

I have similar issue here using default xcmp engine.

I use docker to compare my two VCFs: docker run -it -v `pwd`:/data/ pkrusche/hap.py /opt/hap.py/bin/hap.py /data/true.vcf.gz/data/query.vcf.gz -r /data/reference/hg19.fa -o /data/test and I got many similar messages as:

2019-06-27 12:19:42,498 ERROR Exception when running <function preprocessWrapper at 0x7fdcecd55140>: 2019-06-27 12:19:42,498 ERROR ------------------------------------------------------------ 2019-06-27 12:19:42,499 ERROR Traceback (most recent call last): 2019-06-27 12:19:42,500 ERROR File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper 2019-06-27 12:19:42,501 ERROR return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs']) 2019-06-27 12:19:42,502 ERROR File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 92, in preprocessWrapper 2019-06-27 12:19:42,503 ERROR runBcftools("index", tf.name) 2019-06-27 12:19:42,504 ERROR File "/opt/hap.py/lib/python27/Tools/bcftools.py", line 50, in runBcftools 2019-06-27 12:19:42,505 ERROR ". Return code was %i, output: %s / %s \n" % (rc, o, e)) 2019-06-27 12:19:42,506 ERROR Exception: Error running BCFTOOLS; please check if your file has issues using vcfcheck. Return code was 255, output: / [E::get_intv] failed to parse TBX_VCF, was wrong -p [type] used?The offending line was: " N 433.73 . . GT:AD:ADO:DP:GQ:PL 1/0:11,12:0:23:99:672"[E::hts_idx_push] unsorted positions on sequence #1: 52110700 followed by 1index: failed to create index for "/tmp/input.chr14:52096904-63255061hhlXlN.prep.vcf.gz"

Then I checked both files using vcfcheck and no issues came up:

W] overlapping records at chr1:14466469 for sample 0 [W] Variants that overlap on the reference allele: 258 [I] Total VCF records: 1147227 [I] Non-reference VCF records: 1147227 [I] X chromosome appears to not be haploid -- assuming this is a female sample

I also tried sorting these two VCFs but still got those error messages. Any suggesstion would be appreciated. Thanks in advance.