brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
248 stars 23 forks source link

DeepTrio GVCF support/guidance using the standard rare disease approach #102

Closed mvelinder closed 3 years ago

mvelinder commented 3 years ago

Spreekt u Engels? Hope you're well friend!

I'm trying to use slivar for DeepTrio calls. They are called individually as GVCFs. I then merged them using bcftools merge -0 and ran slivar using the standard approach for rare disease with the slivar-functions.js and my expressions --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' and --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001'

The variant in question is a de novo (also called by GATK - which is where we originally found it). Here's the VCF record for the merged trio VCF

chrX    18604572    .   C   T   9.9 PASS    .   GT:GQ:DP:AD:VAF:PL  0/0:.:.:.:.:.   0/0:.:.:.:.:.   0/1:10:56:31,25:0.446429:9,0,34

Even with the 0/0 genotypes in both parents this doesn't meet my slivar criteria. Is this because of the missing/NULL fields for GT:GQ:DP:AD that the .js funtctions are looking for?

Here's the full output, you can ignore the compound het failures since it's not an annotated VCF.

+ vcf=A1167_ASIA.deepvariant.0merge.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
+ ped=A1167-210621-VAR-Botto-ASIA.ped
+ fasta=/scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh38/human_g1k_v38_decoy_phix.fasta
+ gnomad=/scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip
+ js=/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js
+ slivar_static expr --vcf A1167_ASIA.deepvariant.0merge.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz --ped A1167-210621-VAR-Botto-ASIA.ped --pass-only --js /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js -g /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip --info 'INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' --family-expr 'recessive:fam.every(segregating_recessive)' --family-expr 'dominant:fam.every(segregating_dominant)' -o A1167_ASIA.deepvariant.0merge.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.slivar.vcf.gz
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip:
   > created on:2020-12-12
[slivar] javascript error. this can some times happen when a field is missing.
error from duktape: unknown attribute:AB for expression:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001

[slivar] occured with variant:chrX  18604434    .   G   <*> 0   .   END=18604763;gnomad_popmax_af=-1;gnomad_n_hets=-1;gnomad_n_homalt=-1    GT:GQ:MIN_DP:PL 0/0:48:23:0,93,929  0/0:.:.:.   0/0:.:.:.
[slivar] continuing execution.
[slivar] javascript error. this can some times happen when a field is missing.
error from duktape: unknown attribute:AB for expression:fam.every(segregating_recessive)

[slivar] occured with variant:chrX  18604434    .   G   <*> 0   .   END=18604763;gnomad_popmax_af=-1;gnomad_n_hets=-1;gnomad_n_homalt=-1    GT:GQ:MIN_DP:PL 0/0:48:23:0,93,929  0/0:.:.:.   0/0:.:.:.
[slivar] continuing execution.
[slivar] javascript error. this can some times happen when a field is missing.
error from duktape: unknown attribute:AB for expression:fam.every(segregating_dominant)

[slivar] occured with variant:chrX  18604434    .   G   <*> 0   .   END=18604763;gnomad_popmax_af=-1;gnomad_n_hets=-1;gnomad_n_homalt=-1    GT:GQ:MIN_DP:PL 0/0:48:23:0,93,929  0/0:.:.:.   0/0:.:.:.
[slivar] continuing execution.
[slivar] Finished. evaluated 4 total variants and wrote 0 variants that passed your slivar expressions.
sample  denovo  x_denovo    recessive   dominant
UDN492951   0   0   0   0
UDN652881   0   0   0   0
UDN776853   0   0   0   0

+ slivar_static expr --vcf A1167_ASIA.deepvariant.0merge.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz --ped A1167-210621-VAR-Botto-ASIA.ped --pass-only --js /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js -g /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --trio 'comphet_side:comphet_side(kid, mom, dad) && INFO.gnomad_popmax_af < 0.005'
+ slivar_static compound-hets -v /dev/stdin -s comphet_side -s denovo -p A1167-210621-VAR-Botto-ASIA.ped -o A1167_ASIA.deepvariant.0merge.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.slivar.ch.vcf.gz
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip:
   > created on:2020-12-12
[slivar] evaluating on 1 trios
fatal.nim(49)            sysFatal
Error: unhandled exception: comphet.nim(231, 12) `gene_fields.len > 0` [slivar] error! no gene-like field found in /dev/stdin [AssertionError]
[slivar] javascript error. this can some times happen when a field is missing.
error from duktape: unknown attribute:AB for expression:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001

[slivar] occured with variant:chrX  18604434    .   G   <*> 0   .   END=18604763;gnomad_popmax_af=-1;gnomad_n_hets=-1;gnomad_n_homalt=-1    GT:GQ:MIN_DP:PL 0/0:48:23:0,93,929  0/0:.:.:.   0/0:.:.:.
[slivar] continuing execution.
[slivar] Finished. evaluated 4 total variants and wrote 0 variants that passed your slivar expressions.
sample  comphet_side    denovo
UDN492951   0   0
UDN652881   0   0
UDN776853   0   0

Any help would be great! Or I'd be happy to work together to make a .js and/or command for best practices for DeepTrio calls.

brentp commented 3 years ago

Hi Matt, ik spreek het een beetje. you'll need to do joint-calling for most tools (including slivar) to work as expected. you can do this with glnexus as per the docs for DeepTrio: https://github.com/google/deepvariant/blob/r1.2/docs/deeptrio-wgs-case-study.md#merge-vcfs-using-glnexus

mvelinder commented 3 years ago

Thanks Brent!

I got this working using a Singularity-based approach on our Utah HPC. I normalized the merged VCF with bcftools norm -m - -w 10000 -f $fasta $mergedvcf -O z -o $normvcf and the variant record in question now looks like this

chrX    18604572    chrX_18604572_C_T   C   T   9   .   AF=0.166667;AQ=9    GT:DP:AD:GQ:PL:RNC  0/0:23:23,0:48:0,93,929:..  0/0:29:29,0:50:0,87,869:..  0/1:56:31,25:10:9,0,34:..

However, slivar is still not prioritizing it as a de novo (using this single variant record test VCF). Here's what I'm running and my output:

+ vcf=trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
+ ped=A1167-210621-VAR-Botto-ASIA.ped
+ fasta=/scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh38/human_g1k_v38_decoy_phix.fasta
+ gnomad=/scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip
+ js=/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js
+ slivar_static expr --vcf trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz --ped A1167-210621-VAR-Botto-ASIA.ped --pass-only --js /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js -g /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip --info 'INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' --family-expr 'recessive:fam.every(segregating_recessive)' --family-expr 'dominant:fam.every(segregating_dominant)' -o trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.slivar.vcf.gz
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip:
   > created on:2020-12-12
[slivar] Finished. evaluated 1 total variants and wrote 0 variants that passed your slivar expressions.
sample  denovo  x_denovo    recessive   dominant
UDN492951   0   0   0   0
UDN652881   0   0   0   0
UDN776853   0   0   0   0

Any ideas?

brentp commented 3 years ago

what's in your pedigree file?

brentp commented 3 years ago

also, just to clarify, that's a different variant that has been joint-called. The first one is just a merged GVCF. So make sure you're applying the slivar expressions to the correct VCFs

mvelinder commented 3 years ago

Here's the DeepTrio test VCF and the ped file contents. Without the VCF header.

$ bcftools view -H trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
chrX    18604572    chrX_18604572_C_T   C   T   9   .   AF=0.166667;AQ=9    GT:DP:AD:GQ:PL:RNC  0/0:23:23,0:48:0,93,929:..  0/0:29:29,0:50:0,87,869:..  0/1:56:31,25:10:9,0,34:..
$ cat A1167-210621-VAR-Botto-ASIA.ped
#Kindred_ID Sample_ID   Paternal_ID Maternal_ID Sex Affection_Status    Project
UDN_X1  UDN492951   0   0   1   1   A1167-210621-VAR-Botto-ASIA
UDN_X1  UDN652881   0   0   2   1   A1167-210621-VAR-Botto-ASIA
UDN_X1  UDN776853   UDN492951   UDN652881   2   2   A1167-210621-VAR-Botto-ASIA

Here's a GATK test VCF and the same ped file as above, where I do get the variant prioritized

$ bcftools view -H A1167-210621-VAR-Botto-ASIA_Final_1624376012.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
chrX    18604572    .   C   T   564.83  PASS    AC=1;AF=0.00172;AN=580;AS_BaseQRankSum=3.104;AS_FS=1.024;AS_InbreedingCoeff=-0.0023;AS_MQ=60;AS_MQRankSum=0;AS_QD=10.27;AS_ReadPosRankSum=-0.963;AS_SOR=0.532;BaseQRankSum=3.104;ClippingRankSum=0;DP=7382;ExcessHet=3.0103;FS=1.024;InbreedingCoeff=-0.0023;MLEAC=1;MLEAF=0.00172;MQ=60;MQ0=0;MQRankSum=0;QD=10.27;ReadPosRankSum=-0.963;SOR=0.532;VQSLOD=3.4898;culprit=DP;CSQ=T|stop_gained|HIGH|CDKL5|ENSG00000008086|Transcript|ENST00000379989|protein_coding|13/22||||1933|1648|550|R/*|Cga/Tga|rs267608643&CM081204&COSM1625757||1||SNV|HGNC|HGNC:11411|YES||1||CCDS14186.1|ENSP00000369325|O76039||UPI0000136103|1|||PANTHER:PTHR24056&PANTHER:PTHR24056:SF111&MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lite|||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|,T|stop_gained|HIGH|CDKL5|ENSG00000008086|Transcript|ENST00000379996|protein_coding|12/21||||1923|1648|550|R/*|Cga/Tga|rs267608643&CM081204&COSM1625757||1||SNV|HGNC|HGNC:11411|||1||CCDS14186.1|ENSP00000369332|O76039||UPI0000136103|1|||PANTHER:PTHR24056&PANTHER:PTHR24056:SF111&MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lite|||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|,T|stop_gained|HIGH|CDKL5|ENSG00000008086|Transcript|ENST00000463994|protein_coding|12/16||||1926|1648|550|R/*|Cga/Tga|rs267608643&CM081204&COSM1625757||1|cds_end_NF|SNV|HGNC|HGNC:11411|||5|||ENSP00000485184||A0A096LNR9|UPI0004F2369C|1|||PANTHER:PTHR24056:SF111&PANTHER:PTHR24056&MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lite|||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|,T|stop_gained|HIGH|CDKL5|ENSG00000008086|Transcript|ENST00000623535|protein_coding|11/17||||1648|1648|550|R/*|Cga/Tga|rs267608643&CM081204&COSM1625757||1||SNV|HGNC|HGNC:11411|||1|P1|CCDS83458.1|ENSP00000485244|O76039||UPI0002390615|1|||PANTHER:PTHR24056&PANTHER:PTHR24056:SF111&MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lite|||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|,T|stop_gained|HIGH|CDKL5|ENSG00000008086|Transcript|ENST00000635828|protein_coding|12/18||||2014|1648|550|R/*|Cga/Tga|rs267608643&CM081204&COSM1625757||1||SNV|HGNC|HGNC:11411|||5|||ENSP00000490170||A0A1B0GUM4|UPI0007E52D32|1|||PANTHER:PTHR24056:SF111&PANTHER:PTHR24056&MobiDB_lite:mobidb-lite&MobiDB_lite:mobidb-lite|||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|,T|downstream_gene_variant|MODIFIER|CDKL5|ENSG00000008086|Transcript|ENST00000637881|protein_coding||||||||||rs267608643&CM081204&COSM1625757|235|1|cds_end_NF|SNV|HGNC|HGNC:11411|||5|||ENSP00000489879||A0A1B0GTX4|UPI0007E52C7A|1||||||||||||||||||||||||pathogenic|0&0&1|1&1&1|18063413&21318334||||||0.128|  GT:AD:DP:GQ:PL:SAC  0/0:27,0:27:64:0,64,691:.   0/0:38,0:38:99:0,104,1158:. 0/1:30,25:55:99:611,0,658:13,17,12,13

And the output from slivar:

+ vcf=A1167-210621-VAR-Botto-ASIA_Final_1624376012.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
+ ped=A1167-210621-VAR-Botto-ASIA.ped
+ fasta=/scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh38/human_g1k_v38_decoy_phix.fasta
+ gnomad=/scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip
+ js=/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js
+ slivar_static expr --vcf A1167-210621-VAR-Botto-ASIA_Final_1624376012.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz --ped A1167-210621-VAR-Botto-ASIA.ped --pass-only --js /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js -g /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip --info 'INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' --family-expr 'recessive:fam.every(segregating_recessive)' --family-expr 'dominant:fam.every(segregating_dominant)' -o A1167-210621-VAR-Botto-ASIA_Final_1624376012.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.slivar.vcf.gz
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip:
   > created on:2020-12-12
[slivar] Finished. evaluated 1 total variants and wrote 1 variants that passed your slivar expressions.
sample  denovo  x_denovo    recessive   dominant
UDN492951   0   0   0   0
UDN652881   0   0   0   0
UDN776853   1   1   0   0

So it seems DeepTrio VCF specific, or maybe there's something in the slivar-functions.js that is causing it to be filtered out?

brentp commented 3 years ago

In the first vcf line you pasted, the kid has a GQ of 10. slivar default cutoff is 20.

mvelinder commented 3 years ago

Yep, that was it. I manually changed the variant GQ to 25 and it worked.

chrX    18604572    chrX_18604572_C_T   C   T   9   .   AF=0.166667;AQ=9    GT:DP:AD:GQ:PL:RNC  0/0:23:23,0:48:0,93,929:..  0/0:29:29,0:50:0,87,869:..  0/1:56:31,25:25:9,0,34:..
+ vcf=trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz
+ ped=A1167-210621-VAR-Botto-ASIA.ped
+ fasta=/scratch/ucgd/lustre/work/u0691312/reference/ucgd_reference/GRCh38/human_g1k_v38_decoy_phix.fasta
+ gnomad=/scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip
+ js=/uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js
+ slivar_static expr --vcf trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz --ped A1167-210621-VAR-Botto-ASIA.ped --pass-only --js /uufs/chpc.utah.edu/common/HIPAA/u0691312/bin/slivar/js/slivar-functions.js -g /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip --info 'INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' --family-expr 'x_denovo:(variant.CHROM == "X" || variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' --family-expr 'recessive:fam.every(segregating_recessive)' --family-expr 'dominant:fam.every(segregating_dominant)' -o trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.slivar.vcf.gz
> slivar version: 0.2.2 186b862063ce50ee1d282bc610196630c4ecac61
[W::hts_idx_load3] The index file is older than the data file: trio_merged.vcf.gz.norm.vcf.gz.denovo.test.vcf.gz.tbi
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /scratch/ucgd/lustre-work/marth/u0691312/reference/gnomad_3.1/gnomad.genomes.v3.1.sites.slivar.zip:
   > created on:2020-12-12
[slivar] Finished. evaluated 1 total variants and wrote 1 variants that passed your slivar expressions.
sample  denovo  x_denovo    recessive   dominant
UDN492951   0   0   0   0
UDN652881   0   0   0   0
UDN776853   1   1   0   0

I'll consider dropping that GQ threshold in my slivar-functions.js - thanks dude!