Shicheng-Guo / rbiotools

Other
0 stars 0 forks source link

`\x3b` and `\x3d` in ANNOVAR and bcftools #9

Open Shicheng-Guo opened 3 years ago

Shicheng-Guo commented 3 years ago

This is the intended behavior for ANNOVAR See this GitHub issue for details. In short, the ; and = characters are not valid within the INFO fields of VCFs, so ANNOVAR codes them as \x3b and \x3d to avoid confusing downstream utilities (e.g. bcftools).

The simplest "fix" for this is probably to use sed to recode these characters before you bgzip the vcf. To replace \x3b with a - and \x3d with a :, the piped command would look like this:

sed 's/\\x3b/-/g' myanno.hg19_multianno.vcf | sed 's/\\x3d/:/g' | bgzip -c > myanno.hg19_multianno.vcf.gz