brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
249 stars 23 forks source link

Empty genotype and depth values in slivar tsv #78

Closed amwenger closed 3 years ago

amwenger commented 3 years ago

@williamrowell

The output of slivar tsv differs depending on the order of sample columns in the VCF. In particularly, genotype and depth information is empty for a parent if that parent is the first sample (column 10). slivar_test.zip is a test case with two VCFs that are identical except for the order of the sample columns:

# headers are same
$ diff -s <(grep "^##" dad_first.vcf) <(grep "^##" kid_first.vcf)
Files /dev/fd/63 and /dev/fd/62 are identical

# data columns 1-9 are same
$ diff -s <(grep -v "^##" dad_first.vcf | cut -f1-9) <(grep -v "^##" kid_first.vcf | cut -f1-9)
Files /dev/fd/63 and /dev/fd/62 are identical

# data columns 10-12 are reordered
diff -s <(grep -v "^##" dad_first.vcf | cut -f10-) <(grep -v "^##" kid_first.vcf | cut -f10-) | column -t
1,2c1,2
<        dad                               mom                           kid
<        0/0:10:10,0:30:0,30,299:..:.      0/0:11:11,0:33:0,33,329:..:.  0|1:13:10,3:31:31,0,37:..:866626
---
>        kid                               dad                           mom
>        0|1:13:10,3:31:31,0,37:..:866626  0/0:10:10,0:30:0,30,299:..:.  0/0:11:11,0:33:0,33,329:..:.

$ slivar tsv --sample-field dominant --csq-field BCSQ --ped family.ped dad_first.vcf  | cut -f1-5,8 | column -t
#mode     family_id  sample_id  chr:pos:ref:alt   genotype(sample,dad,mom)  depths(sample,dad,mom)
dominant  Family1    kid        chr1:1077433:G:T  1,.,0                     13,.,11

$ slivar tsv --sample-field dominant --csq-field BCSQ --ped family.ped kid_first.vcf  | cut -f1-5,8 | column -t
#mode     family_id  sample_id  chr:pos:ref:alt   genotype(sample,dad,mom)  depths(sample,dad,mom)
dominant  Family1    kid        chr1:1077433:G:T  1,0,0                     13,10,11

It looks to me like the issue might be the >0 check in lines 42-45 of tsv.nim: https://github.com/brentp/slivar/blob/5b72196e399c90f9c18aefd39cde681316f7849e/src/slivarpkg/tsv.nim#L42-L45.

Thank you for the support.

brentp commented 3 years ago

wow. you are right. that should be >= 0. I'll fix this for next release.

brentp commented 3 years ago

Here is a binary with that fixed if you are able to test or want to proceed with your analysis. Thanks very much for finding and diagnosing this.

brentp commented 3 years ago

Binary here: slivar.gz