brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
249 stars 23 forks source link

Detection of compound het in trans #108

Closed krukanna closed 2 years ago

krukanna commented 2 years ago

Hi, is it possible for the program to detect in trans compound heterozygotes? I have tested on several known examples from my families and unfortunately such variants have not been detected. Maybe I should change some parameters or family-expr function?

brentp commented 2 years ago

yes. that is the purpose. it should detect variants in trans within the same gene. if you see this is not the case, please post the variant including info field and sample fields for parents and child and we can determine what is the problem.

krukanna commented 2 years ago

@brentp Here is my example - I send you lines from vcf file with part of VEP annotations. I can't see this positions as compound het.

chr6    26091179        .       C       G       47978.1 PASS    AC=2;AF=0.167;AN=6;BaseQRankSum=0.475;DP=5065;ExcessHet=1.1658;FS=1.757;InbreedingCoeff=0.1;MLEAC=16;MLEAF=0.167;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=19.35;ReadPosRankSum=0.238;SOR=0.798;VQSLOD=25.5;culprit=MQ;CSQ=G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000309234|protein_coding|2/6||||189|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886||||||ENSP00000311698|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000317896|protein_coding|2/5||||347|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS4579.1|ENSP00000313776|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000336625|protein_coding|2/5||||231|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS47386.1|ENSP00000337819|,G|intron_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000349999|protein_coding||1/4||||||||rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS4580.1|ENSP00000259699|,G|intron_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000352392|protein_coding||1/2||||||||rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS4582.1|ENSP00000315936|,G|intron_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000353147|protein_coding||1/3||||||||rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS4581.1|ENSP00000312342|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000357618|protein_coding|2/6||||309|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|YES||||CCDS4578.1|ENSP00000417404|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000397022|protein_coding|2/6||||278|118|40|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS47387.1|ENSP00000380217|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000461397|protein_coding|2/6||||223|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS54974.1|ENSP00000420802|,G|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000470149|protein_coding|2/7||||236|187|63|H/D|Cat/Gat|rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886||||||ENSP00000419725||,G|upstream_gene_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000485729|protein_coding||||||||||rs1799945&CM960827&COSV58513169|1|1997|1|cds_start_NF|SNV|HGNC|4886||||||ENSP00000417534|,G|non_coding_transcript_exon_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000486147|retained_intron|2/5||||230|||||rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||||,G|intron_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000488199|protein_coding||1/4||||||||rs1799945&CM960827&COSV58513169|1||1||SNV|HGNC|4886|||||CCDS54975.1|ENSP00000420559|     GT:AD:DP:GQ:PL  0/1:76,78:154:99:2566,0,2708   0/0:64,0:64:99:0,120,1800        0/1:93,67:160:99:1965,0,3139

chr6    26093141        .       G       A       8107.96 PASS    AC=2;AF=0.031;AN=6;BaseQRankSum=-7.801;DP=5296;ExcessHet=3.1497;FS=4.99;InbreedingCoeff=-0.0323;MLEAC=3;MLEAF=0.031;MQ=60;MQRankSum=0;POSITIVE_TRAIN_SITE;QD=17.7;ReadPosRankSum=-0.858;SOR=0.44;VQSLOD=19.49;culprit=MQ;CSQ=A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000309234|protein_coding|4/6||||847|845|282|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886||||||ENSP00000311698|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000317896|protein_coding|3/5||||729|569|190|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS4579.1|ENSP00000313776|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000336625|protein_coding|3/5||||571|527|176|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS47386.1|ENSP00000337819|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000349999|protein_coding|3/5||||741|581|194|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS4580.1|ENSP00000259699|,A|intron_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000352392|protein_coding||1/2||||||||rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS4582.1|ENSP00000315936|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000353147|protein_coding|2/4||||465|305|102|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS4581.1|ENSP00000312342|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000357618|protein_coding|4/6||||967|845|282|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|YES||||CCDS4578.1|ENSP00000417404|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000397022|protein_coding|4/6||||936|776|259|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS47387.1|ENSP00000380217|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000461397|protein_coding|4/6||||839|803|268|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS54974.1|ENSP00000420802|,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000470149|protein_coding|5/7||||885|836|279|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886||||||ENSP00000419725|,A|non_coding_transcript_exon_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000483782|retained_intron|3/3||||1176|||||rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||||,A|upstream_gene_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000485729|protein_coding||||||||||rs1800562&CM004391&CM960828|1|35|1|cds_start_NF|SNV|HGNC|4886||||||ENSP00000417534|,A|non_coding_transcript_exon_variant|MODIFIER|HFE|ENSG00000010704|Transcript|ENST00000486147|retained_intron|3/5||||688|||||rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||||,A|missense_variant|MODERATE|HFE|ENSG00000010704|Transcript|ENST00000488199|protein_coding|3/5||||575|539|180|C/Y|tGc/tAc|rs1800562&CM004391&CM960828|1||1||SNV|HGNC|4886|||||CCDS54975.1|ENSP00000420559|       GT:AD:DP:GQ:PL  0/1:74,62:136:99:2128,0,2944    0/1:66,73:139:99:2435,0,2692    0/0:102,0:102:99:0,120,1800
brentp commented 2 years ago

Hi, https://www.ncbi.nlm.nih.gov/snp/rs1799945 has a VAF of close to 0.1. So that might account for it. What slivar command are you using to filter these? I'd need to see both the slivar expr and the slivar compound-hets command.

brentp commented 2 years ago

The other variant is at 3% AF in the population: https://www.ncbi.nlm.nih.gov/snp/rs1800562

krukanna commented 2 years ago
slivar expr \
   --pass-only \
   --vcf $VCF_ann \
   --ped $SAMPLES_RELATION \
   --gnotate gnomad.hg37.zip \
   --js slivar-functions.js \
   --out-vcf $DIR_OUTPUT/de_novo_output.vcf \
   --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' \
   --family-expr 'denovo:fam.every(segregating_denovo) && INFO.gnomad_popmax_af < 0.001' \
   --family-expr 'recessive:fam.every(segregating_recessive)' \
   --family-expr 'x_denovo:(variant.CHROM == "chrX") && fam.every(segregating_denovo_x) && INFO.gnomad_popmax_af < 0.001' \
   --family-expr 'x_recessive:(variant.CHROM == "chrX") && fam.every(segregating_recessive_x)' \
   --family-expr 'comphet_side:fam.every(function(s) {return (s.het || s.hom_ref) && s.GQ >= 20 }) && fam.some(function(s) { return s.het && s.affected })'

slivar compound-hets -v $DIR_OUTPUT/de_novo_output.vcf \
    --sample-field comphet_side --sample-field denovo --skip NONE -p $SAMPLES_RELATION > $DIR_OUTPUT/compound_het_output.vcf
brentp commented 2 years ago

Yes, so this:

   --info 'INFO.impactful && INFO.gnomad_popmax_af < 0.01 && variant.FILTER == "PASS" && variant.ALT[0] != "*"' \

removes any variants that have a gnomad_af > 0.01 . You can increase that to INFO.gnomad_popmax_af < 0.2 and likely recover those variants.

krukanna commented 2 years ago

@brentp Of course, it changed the result, now I see these variants as compound het. Do you recommend to stay by filter gnomad_popmax_af < 0.01 or increase?

brentp commented 2 years ago

That's a good question. The defaults make sense from a population genetics perspective, but they can exclude sites where the allele frequency is higher. Presumably that could happen with less severe or late-onset phenotypes. As you probably know, one of your variants is even listed as pathogenic in clinvar with a 0.03 AF. I didn't look at the phenotype enough to discern if it seems likely, but that's something to consider. It's always a trade-off between sensitivity and specificity and the defaults are a good balance.

krukanna commented 2 years ago

@brentp Thanks for your help, that's very kind. Is there also possibility to find compound het in trans in single sample (in slivar compound-hets )? I've got this variants as compounds het in result from slivar expr (below), but in slivar compound-hets occurs an error: "no trios found"

#mode   family_id   sample_id   chr:pos:ref:alt genotype(sample,dad,mom)    gnomad_popmax_af    gnomad_popmax_af_filter gnomad_nhomalt  gene    highest_impact  depths(sample,dad,mom)  allele_balance(sample,dad,mom)
comphet_side    R155    CP00055822  chr11:68908186:C:T  1,.,.   -1      -1  IGHMBP2;MRPL21  03_stop_gained  108,.,. 0.398148,.,.
comphet_side    R155    CP00055822  chr11:68908327:C:T  1,.,.   0.000146864     0   IGHMBP2;MRPL21  03_stop_gained  139,.,. 0.510791,.,.
brentp commented 2 years ago

You can use the --allow-non-trios flag to slivar compound-hets. You'll get a lot of candidates, but if you have some genes of interest, it might work out.

krukanna commented 2 years ago

Thanks for help, result is correct ;)