Open ifiddes-10x-zz opened 6 years ago
I think it's probably worth trying to retrieve the intermediate file. Hap.py has two command line options for doing that: --keep-scratch
will not delete intermediate files, --scratch-prefix <DIR>
will write all temp files into a <DIR>
.
This type of error may be caused by incomplete header information / a bcf record that isn't consistent with the header / too many format fields (htslib incrementally gets better at finding these). One way to check if this is it is to remove info and format fields (bcftools annotate -x INFO,^FORMAT/GT
), and/or to remove homref records from the query which are ignored anyway (see #37).
Hi pkrusche
I am experiencing a similar problem, so hopefully this was solved or maybe you have ideas for what to do.
log:
[I] Total VCF records: 4049512
[I] Non-reference VCF records: 4049512
Contig chr1 is not known
2018-04-16 11:31:53,610 ERROR Command 'gvcf2bed /home/projects/dp_00005/data/nanbar/happy_results/truth.ppdsGNyX.vcf.gz -r human_g1k_v37_decoy.fasta -o /home/projects/dp_00005/data/nanbar/happy_results/tmpMGfLKa.bed -T /home/projects/dp_00005/data/references/NA12878/ConfidentRegions.bed.gz' returned non-zero exit status 1
2018-04-16 11:31:53,610 ERROR Traceback (most recent call last):
2018-04-16 11:31:53,610 ERROR File "/services/tools/hap.py/0.3.10/bin/hap.py", line 511, in <module>
2018-04-16 11:31:53,612 ERROR main()
2018-04-16 11:31:53,612 ERROR File "/services/tools/hap.py/0.3.10/bin/hap.py", line 304, in main
2018-04-16 11:31:53,612 ERROR conf_temp = Haplo.gvcf2bed.gvcf2bed(args.vcf1, args.ref, args.fp_bedfile, args.scratch_prefix)
2018-04-16 11:31:53,612 ERROR File "/services/tools/hap.py/0.3.10/lib/python27/Haplo/gvcf2bed.py", line 39, in gvcf2bed
2018-04-16 11:31:53,615 ERROR subprocess.check_call(cmdline, shell=True)
2018-04-16 11:31:53,615 ERROR File "/services/tools/anaconda2/4.0.0/lib/python2.7/subprocess.py", line 186, in check_call
2018-04-16 11:31:53,620 ERROR raise CalledProcessError(retcode, cmd)
2018-04-16 11:31:53,620 ERROR CalledProcessError: Command 'gvcf2bed /home/projects/dp_00005/data/nanbar/happy_results/truth.ppdsGNyX.vcf.gz -r human_g1k_v37_decoy.fasta -o /home/projects/dp_00005/data/nanbar/happy_results/tmpMGfLKa.bed -T /home/projects/dp_00005/data/references/NA12878/ConfidentRegions.bed.gz' returned non-zero exit status 1
I tried removing info and format fields from my own VCF as you mentioned in your previous answer. My VCF-files were generated using the hg37 decoy from Broads resource bundle, thus my VCF does not have the prefix 'chr' in front of the chromosome number. I tried adding this manually but it doesn't help. Should it be a problem..? I tried keeping the intermediate files but they don't seem to give any useful information, I only have an empty .bed-file and VCF for the truth-set with indices.
Thanks for your help!
Could it be that you're using a reference Fasta that uses numeric chromosome names (no chr) but some inputs (truth files) which do?
Hap.py will only add a chr prefix and not removed it, and the only files it can do this for are the truth and query VCF files, not the reference fasta. When the reference fasta file has numeric chromosome names then all the inputs must also have these.
When changing chromosome names in a VCF file, it is best to change both the names for each record (also not that MT and chrM may have slightly different names) and also the contig entries in the VCF header. Htslib needs these to match and will fail otherwise.
All of my input files have numeric chromosome numbers. Still it breaks. Do we have a solution for this ?
Hi, I ran the following command and get an error:
/cadappl/hap.py/bin/hap.py BenchmarkData/HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-1phased.chr.vcf.gz HG002/AH8VC6ADXX/HG002.AH8VC6ADXX.gatk4.norm.filter.sort.vcf --pass-only -r /home/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/hg19_bwa_0.7.17/hg19.fa -f BenchmarkData/HG002_GRCh10X-SOLID_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed -o HG002-happy -V --keep-scratch --scratch-prefix tempHapy
2019-01-14 11:25:29,318 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
2019-01-14 11:25:29,332 WARNING No reference file found at default locations. You can set the environment variable 'HGREF' or 'HG19' to point to a suitable Fasta file.
[I] Total VCF records: 3608925
[I] Non-reference VCF records: 3608925
[W] overlapping records at chr3:27621138 for sample 0
[W] Variants that overlap on the reference allele: 12
[I] Total VCF records: 4999293
[I] Non-reference VCF records: 4999293
2019-01-14 12:44:39,222 ERROR [stderr] regex_error
2019-01-14 12:44:39,226 ERROR Command 'quantify /home/p2010-217-gpfs/Arti/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/hap.py.result.OwatQJ.vcf.gz -o HG002-happy.roc.tsv -sis/Datasets/hg19_bwa_0.7.17/hg19.fa --threads 128 --output-vtc 0 --output-rocs 1 --type xcmp --qq IQQ --qq-header QUAL --roc-delta 0.500000 --clean-info 1 --fix-chr-regions 0 -v HG002-happy.vcf.gz/VariantAnalysis/Datasets/GIAB/AzhkenazimTrio/WGS/HiSeq300X/tempHapy/tmp4CgkVd.bed' -R 'CONF:BenchmarkData/HG002_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-22_v.3.3.2_highconf_noirned non-zero exit status 1
2019-01-14 12:44:39,226 ERROR Traceback (most recent call last):
2019-01-14 12:44:39,227 ERROR File "/cadappl/hap.py/bin/hap.py", line 511, in
What is causing this error and how do I fix it?
log:
stdout:
Any idea what is going on? Since
/tmp/hap.py.result.jryogM.vcf.gz
is a temporary file I can't check on it after the run. Is there a way to retain intermediates?