bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
50 stars 9 forks source link

VCF output for hetrozygotes seems to be have incorrect GT's #39

Closed bartcharbon closed 3 days ago

bartcharbon commented 1 month ago

Hi @readmanchiu

Thanks again for implementing the VCF output!

Unfortunatly I run into an issue with the GTs for hetrozygotes: I seem to get two VCF lines in these cases, of which one has a 0/0 GT and the other a 1/1 GT.

I would expect:

readmanchiu commented 1 month ago

Hmmm...that's interesting, my test cases all have one locus (chrom, start, end) per line. Is it ok for you to share the error?

bartcharbon commented 3 weeks ago

Unfortunatly my current data is patient data I cannot share.

I will try to reproduce the issue with public data.

bartcharbon commented 2 weeks ago

I attached the output of a Genome-in-a-bottle sample. (vcf file is zipped because otherwise github rejects the file)

The variant on chr13:70139353 illustrates the issue, there are two lines for this position, the first displaying a 1/1 GT and the second a 0/0.

test.zip

readmanchiu commented 2 weeks ago

hmmm...the 2 lines actually have 2 different POS for the ATXN8OS locus, one at 70139353 and the other at 70139383 so it seems like it may not be a bug of the vcf output but something wrong with the genotyping. Do you also have 70139353 and 70139383 in the TSV output? I can double-check on my end to see if I can reproduce the error on ATXN8OS, but can you also check the ATXN8OS in the bed file you provided for the run (I assume you're running the genotyping mode)

bartcharbon commented 3 days ago

Hi @readmanchiu

Sorry, I guess the error is on my side, I'll have a look at the original test with some patient data, but probably I just missed the differences in the positions there as well.

I'll close this for now.