bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Invalid character '%' in 'FREQ' FORMAT field #1914

Closed JoKw closed 7 years ago

JoKw commented 7 years ago

Hi,

When I used the ensemble method, I encountered the following error:

CalledProcessError: Command 'set -o pipefail; bcftools reheader -h /home/test/sample/work/varscan/samplecall-annotated-header.txt /home/test/sample/work/varscan/samplecall.vcf.gz | bcftools view | bgzip -c > /home/test/sample/work/bcbiotx/tmpB22Zux/samplecall-annotated.vcf.gz [W::vcf_parse] contig '1' is not defined in the header. (Quick workaround: index the file with tabix.) [E::vcf_parse_format] Invalid character '%' in 'FREQ' FORMAT field at 1:11181327 Write failed, wrote -1 instead of 14820 bytes. ' returned non-zero exit status 255

I had a look at the Varscan VCF file and it seems that the error was caused by the use of "%" in the FREQ field in the VCF file

Any fix for that? Thanks

chapmanb commented 7 years ago

Sorry about the issue, it looks like there is something problematic with the varscan output file that bcftools does not like. Would you be able to share the output of:

tabix /home/test/sample/work/varscan/samplecall.vcf.gz 1:11181327-11181327

and we can determine how best to work around it? Thanks much for the help debugging.

JoKw commented 7 years ago

1 11166713 . T C 0 PASS ADP=168;HET=1;HOM=0;NC=0;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:154:168:168:122:46:27.38%:3.9712E-16:55:54:88:34:34:12

On Mon, 24 Apr 2017 at 22:04 Brad Chapman notifications@github.com wrote:

Sorry about the issue, it looks like there is something problematic with the varscan output file that bcftools does not like. Would you be able to share the output of:

tabix /home/test/sample/work/varscan/samplecall.vcf.gz 1:11181327-11181327

and we can determine how best to work around it? Thanks much for the help debugging.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/1914#issuecomment-296678472, or mute the thread https://github.com/notifications/unsubscribe-auth/ATW-fSJpcknPpte7lRmpRBhIxd6iaqk_ks5rzKvQgaJpZM4NGGvy .

chapmanb commented 7 years ago

Thanks so much for following up with the problem samples. This helped me identify the underlying issue: we have code to fix FREQ but it was only being used on tumor/normal samples, not tumor-only. I pushed a fix to the latest development version (bcbio_nextgen.py upgrade -u development) which should resolve the problem. You can re-run your current project but should remove the varscan directory and checkpoints_parallel directories to let bcbio re-run the VarScan variant calls with the post-calling fix. Hope this gets everything working cleanly for you.