PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

This error happens when the `FORMAT` column and the sample column in the VCF do not have the same number of elements. #67

Closed BinglanLi closed 1 year ago

BinglanLi commented 3 years ago

Can preprocessor or bcftools fix the following format issue?

This error happens when the FORMAT column and the sample column in the VCF do not have the same number of elements.

For example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    97078987    rs114096998 G   T   .   PASS    .   GT  0/0
chr1    97078987    rs114096998 G   T   .   FAIL    .   GT:GQ   0/0
chr1    97078987    rs114096998 G   T   .   FAIL    .   .   0/0

The first line is correct (1 entry in format, 1 entry in sample). The second line fails because there are 2 entries in format, 1 entry in sample. The third line fails because there are 0 entries in format, 1 entry in sample.

I'm closing this issue for now. If this is not the case, please reopen with a sample of the VCF data containing the problem. Please make sure the VCF data is anonymized.

FYI, we just released 1.0 and encourage you to update to the latest version.

Originally posted by @markwoon in https://github.com/PharmGKB/PharmCAT/issues/62#issuecomment-937321838

BinglanLi commented 2 years ago

This is an issue with VCF file format.

Case 1

To fix the following line,

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    97078987    rs114096998 G   T   .   FAIL    .   GT:GQ   0/0

use the following bcftools command to remove all FORMAT tags except GT

bcftools annotate -x FORMAT <input_vcf>

Case 2

The following case is unlikely so no solution is presented here.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    97078987    rs114096998 G   T   .   FAIL    .   .   0/0
BinglanLi commented 1 year ago

Closing this issue because this is a VCF file format issue and should be addressed outside PharmCAT.