Illumina / hap.py

Haplotype VCF comparison tools
Other
406 stars 123 forks source link

quantify returned non-zero exit status -11 #40

Open ifiddes-10x-zz opened 6 years ago

ifiddes-10x-zz commented 6 years ago

log:

2018-02-14 09:30:18,372 ERROR    Command 'quantify /tmp/hap.py.result.jryogM.vcf.gz -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.roc.tsv -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa --threads 8 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.vcf.gz --roc-regions '*'' returned non-zero exit status -11
2018-02-14 09:30:18,372 ERROR    Traceback (most recent call last):
2018-02-14 09:30:18,372 ERROR      File "build/bin/hap.py", line 511, in <module>
2018-02-14 09:30:18,373 ERROR        main()
2018-02-14 09:30:18,373 ERROR      File "build/bin/hap.py", line 496, in main
2018-02-14 09:30:18,373 ERROR        qfy.quantify(args)
2018-02-14 09:30:18,373 ERROR      File "/mnt/home/ian/hap.py/build/bin/qfy.py", line 129, in quantify
2018-02-14 09:30:18,374 ERROR        strat_fixchr=args.strat_fixchr)
2018-02-14 09:30:18,374 ERROR      File "/mnt/home/ian/hap.py/build/lib/python27/Haplo/quantify.py", line 178, in run_quantify
2018-02-14 09:30:18,375 ERROR        subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe)
2018-02-14 09:30:18,375 ERROR      File "/mnt/home/ian/anaconda2/lib/python2.7/subprocess.py", line 186, in check_call
2018-02-14 09:30:18,377 ERROR        raise CalledProcessError(retcode, cmd)
2018-02-14 09:30:18,377 ERROR    CalledProcessError: Command 'quantify /tmp/hap.py.result.jryogM.vcf.gz -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.roc.tsv -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa --threads 8 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy.vcf.gz --roc-regions '*'' returned non-zero exit status -11

stdout:

bespin1 Wed Feb 14 09:27 hap.py $build/bin/hap.py \
> /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/svanalyzer_union_171212_v0.5.0_annotated.with_chr.vcf \
> /mnt/yard2/ian/structural_variants/longranger2.0-vcfs/HG2_combined.sorted.vcf.gz \
> -r /mnt/opt/refdata_new/hg19-2.0.0/fasta/genome.fa \
> -o /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/HG2.happy \
> --roc QUAL -V \
> --logfile /mnt/yard2/ian/structural_variants/CHM1_CHM13_analysis/happy.log \
> --threads 8
[W] overlapping records at chr1:1285401 for sample 1
[W] variant at chr1:79708114 has more than one base of reference padding
[W] Variants that have >1 base of reference padding: 142
[W] Variants that overlap on the reference allele: 2765
[I] Total VCF records:         74216
[I] Non-reference VCF records: 66671
[W] Symbolic / SV ALT alleles at chr1:649703
[W] overlapping records at chr1:829171 for sample 0
[W] variant at chr1:20164759 has more than one base of reference padding
[W] Variants that have >1 base of reference padding: 253
[W] Variants that overlap on the reference allele: 291
[W] Variants that have symbolic ALT alleles: 6177
[I] Total VCF records:         32009
[I] Non-reference VCF records: 31880

Any idea what is going on? Since /tmp/hap.py.result.jryogM.vcf.gz is a temporary file I can't check on it after the run. Is there a way to retain intermediates?

pkrusche commented 6 years ago

I think it's probably worth trying to retrieve the intermediate file. Hap.py has two command line options for doing that: --keep-scratch will not delete intermediate files, --scratch-prefix <DIR> will write all temp files into a <DIR>.

This type of error may be caused by incomplete header information / a bcf record that isn't consistent with the header / too many format fields (htslib incrementally gets better at finding these). One way to check if this is it is to remove info and format fields (bcftools annotate -x INFO,^FORMAT/GT), and/or to remove homref records from the query which are ignored anyway (see #37).

nannabarnkob commented 6 years ago

Hi pkrusche

I am experiencing a similar problem, so hopefully this was solved or maybe you have ideas for what to do.

log:

[I] Total VCF records:         4049512
[I] Non-reference VCF records: 4049512
Contig chr1 is not known
2018-04-16 11:31:53,610 ERROR    Command 'gvcf2bed /home/projects/dp_00005/data/nanbar/happy_results/truth.ppdsGNyX.vcf.gz -r human_g1k_v37_decoy.fasta -o /home/projects/dp_00005/data/nanbar/happy_results/tmpMGfLKa.bed -T /home/projects/dp_00005/data/references/NA12878/ConfidentRegions.bed.gz' returned non-zero exit status 1
2018-04-16 11:31:53,610 ERROR    Traceback (most recent call last):
2018-04-16 11:31:53,610 ERROR      File "/services/tools/hap.py/0.3.10/bin/hap.py", line 511, in <module>
2018-04-16 11:31:53,612 ERROR        main()
2018-04-16 11:31:53,612 ERROR      File "/services/tools/hap.py/0.3.10/bin/hap.py", line 304, in main
2018-04-16 11:31:53,612 ERROR        conf_temp = Haplo.gvcf2bed.gvcf2bed(args.vcf1, args.ref, args.fp_bedfile, args.scratch_prefix)
2018-04-16 11:31:53,612 ERROR      File "/services/tools/hap.py/0.3.10/lib/python27/Haplo/gvcf2bed.py", line 39, in gvcf2bed
2018-04-16 11:31:53,615 ERROR        subprocess.check_call(cmdline, shell=True)
2018-04-16 11:31:53,615 ERROR      File "/services/tools/anaconda2/4.0.0/lib/python2.7/subprocess.py", line 186, in check_call
2018-04-16 11:31:53,620 ERROR        raise CalledProcessError(retcode, cmd)
2018-04-16 11:31:53,620 ERROR    CalledProcessError: Command 'gvcf2bed /home/projects/dp_00005/data/nanbar/happy_results/truth.ppdsGNyX.vcf.gz -r human_g1k_v37_decoy.fasta -o /home/projects/dp_00005/data/nanbar/happy_results/tmpMGfLKa.bed -T /home/projects/dp_00005/data/references/NA12878/ConfidentRegions.bed.gz' returned non-zero exit status 1

I tried removing info and format fields from my own VCF as you mentioned in your previous answer. My VCF-files were generated using the hg37 decoy from Broads resource bundle, thus my VCF does not have the prefix 'chr' in front of the chromosome number. I tried adding this manually but it doesn't help. Should it be a problem..? I tried keeping the intermediate files but they don't seem to give any useful information, I only have an empty .bed-file and VCF for the truth-set with indices.

Thanks for your help!

pkrusche commented 6 years ago

Could it be that you're using a reference Fasta that uses numeric chromosome names (no chr) but some inputs (truth files) which do?

Hap.py will only add a chr prefix and not removed it, and the only files it can do this for are the truth and query VCF files, not the reference fasta. When the reference fasta file has numeric chromosome names then all the inputs must also have these.

When changing chromosome names in a VCF file, it is best to change both the names for each record (also not that MT and chrM may have slightly different names) and also the contig entries in the VCF header. Htslib needs these to match and will fail otherwise.

kkapuria3 commented 6 years ago

All of my input files have numeric chromosome numbers. Still it breaks. Do we have a solution for this ?

artitandon commented 5 years ago

Hi, I ran the following command and get an error: /cadappl/hap.py/bin/hap.py BenchmarkData/HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-1phased.chr.vcf.gz HG002/AH8VC6ADXX/HG002.AH8VC6ADXX.gatk4.norm.filter.sort.vcf --pass-only -r /home/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/hg19_bwa_0.7.17/hg19.fa -f BenchmarkData/HG002_GRCh10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed -o HG002-happy -V --keep-scratch --scratch-prefix tempHapy 2019-01-14 11:25:29,318 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file. 2019-01-14 11:25:29,332 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file. [I] Total VCF records: 3608925 [I] Non-reference VCF records: 3608925 [W] overlapping records at chr3:27621138 for sample 0 [W] Variants that overlap on the reference allele: 12 [I] Total VCF records: 4999293 [I] Non-reference VCF records: 4999293 2019-01-14 12:44:39,222 ERROR [stderr] regex_error 2019-01-14 12:44:39,226 ERROR Command 'quantify /home/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/hap.py.result.OwatQJ.vcf.gz -o HG002-happy.roc.tsv -sis/Datasets/hg19_bwa_0.7.17/hg19.fa --threads 128 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v HG002-happy.vcf.gz/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/tmp4CgkVd.bed' -R 'CONF:BenchmarkData/HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noirned non-zero exit status 1 2019-01-14 12:44:39,226 ERROR Traceback (most recent call last): 2019-01-14 12:44:39,227 ERROR File "/cadappl/hap.py/bin/hap.py", line 511, in 2019-01-14 12:44:39,229 ERROR main() 2019-01-14 12:44:39,229 ERROR File "/cadappl/hap.py/bin/hap.py", line 496, in main 2019-01-14 12:44:39,229 ERROR qfy.quantify(args) 2019-01-14 12:44:39,230 ERROR File "/home/cadappl/cadappl_linux_i386/hap.py/bin/qfy.py", line 129, in quantify 2019-01-14 12:44:39,231 ERROR strat_fixchr=args.strat_fixchr) 2019-01-14 12:44:39,232 ERROR File "/home/cadappl/cadappl_linux_i386/hap.py/lib/python27/Haplo/quantify.py", line 177, in run_quantify 2019-01-14 12:44:39,234 ERROR subprocess.check_call(run_str, shell=True, stdout=tfo, stderr=tfe) 2019-01-14 12:44:39,234 ERROR File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call 2019-01-14 12:44:39,235 ERROR raise CalledProcessError(retcode, cmd) 2019-01-14 12:44:39,235 ERROR CalledProcessError: Command 'quantify /home/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/hap.py.result.OwatQJ.vcf.gz -o Hfs/Arti/VariantAnalysis/Datasets/hg19_bwa_0.7.17/hg19.fa --threads 128 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/tmp4CgkVd.bed' -R 'CONF:BenchmarkData/HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_oc-regions '*'' returned non-zero exit status 1

What is causing this error and how do I fix it?