Closed inti4digbi closed 6 months ago
Thank you for reporting the bug. Could you please share more about your sample file?
A gVCF file is determined if the INFO
field has a tag formatted as END=123456
, which is a signature of a gVCF file. See examples here.
Hi, yes i can provide the first part of the file but I am not sure it is needed on this case.
I have checked and as you said, the variants with a END=123456
in the INFO
field trigger the error. This field as part of the standard vcf format description as far as I know, as per https://samtools.github.io/hts-specs/VCFv4.2.pdf section 1.4.1 number 8.
Is there a preferred practice to deal with this?
VCF v4.3 specs provide a detailed description of the INFO/END
field:
END: End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally... no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown.
Do you have a non-variant block or a structural variant in your VCF? Could you please share with us the line of the variant with the INFO/END
field (without the genotypes)? It helps us understand the issue.
The check on the INFO/END
field is a safety check for gVCF. If you are confident that the file is a VCF and has undergone sufficient QC procedures, you can strip the INFO
field from your VCF.
Hi, here is an example
1 944010 rs764300897 GGA G . . END=944012
The END
field is mostly used to track the end position on the original annotation file. We can ignore it for this analysis.
Bug
When running the preprocessor or the pipeline I get the warning that input file is a gVCF. The input file is a standard VCF files containing array/chip genotypes + imputed markers.
What is the expected behavior? I guess it would be to process the file as a standard VCF file
What is the motivation / use case for changing the behavior?
Please tell us about your environment:
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)