Illumina / GTCtoVCF

Script to convert GTC/BPM files to VCF
Apache License 2.0
41 stars 31 forks source link

Empty ALT genotypes? #81

Open rajwanir opened 2 months ago

rajwanir commented 2 months ago

Hello @jzieve

I see some of the genotypes have empty ALT allele. This could be simply missing calls. But instead of using . or N, the field is simply empty. In other records I do see the ALT being populated with an N. I can fix it by adding . or N in place of empty ALT field. But wanted to check with you if there is any reason that it might be intentionally set as empty by the GTCtoVCF.

Example records:

chr1 47851 cnvi0146654 C . PASS . GT:GQ ./.:0 chr1 50251 cnvi0146656 T . PASS . GT:GQ ./.:0 chr1 51938 cnvi0151530 T N,A . PASS . GT:GQ ./.:0 chr1 52651 cnvi0146655 T . PASS . GT:GQ ./.:0 chr1 55338 cnvi0159124 A N . PASS . GT:GQ ./.:0 chr1 64251 cnvi0146663 A . PASS . GT:GQ ./.:0 chr1 65451 cnvi0147451 A . PASS . GT:GQ ./.:0 chr1 80386 rs3878915 C A . PASS . GT:GQ ./.:0 chr1 82154 rs4477212 A T,C . PASS . GT:GQ 1/1:3 chr1 82620 cnvi0052563 A N . PASS . GT:GQ ./.:0

Thanks in advance for your help.

jzieve commented 2 months ago

I don't know of any reason why it should be empty off the top of my head. What product/manifest is this? Some of these seem to be CNV related targets and maybe could be filtered out of the VCF via the --filter-loci option. Ref/Alt is based off the manifest fields such as SNP, SourceSeq, RefStrand and the bases when looked up on the reference genome. So something must be different about these loci.