brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
252 stars 23 forks source link

compound heterozygote tooling #143

Open brentp opened 2 years ago

brentp commented 2 years ago

This is to track a new tool for more exhaustive support for compound heterozygotes. The current tool supports probably 90% of use-cases.

Other uses include:

  1. detecting non-standard compound variants (not heterozygotes) #138
  2. annotating with information from co-occuring variants from gnomad pairs
  3. accepting a SNP and SV (or STR) VCF to find heterozgotes in SNP and SV or SNP and STR

other uses-cases and feedback welcome.

Plan

Plan is to build a new tools slivar ch.

slivar expr \
   --sample-expr 'parent_ch:sample.GQ > 20 && (sample.hom_ref || (sample.het && sample.AB > 0.2 && sample.AB < 0.8)) && INFO.impactful && sample.kids.length > 0' \
   --sample-expr 'kid_ch:sample.GQ > 20 && sample.het && INFO.impactful'

# annotate worthy SVs
slivar expr -v $other_SV_vcf -o $other_ch_vcf \
   --sample-expr 'parent_ch:...' \
   --sample-expr 'kid_ch:...' 

slivar ch \
    --ped $ped \
    --parent parent_ch \
    --kid kid_ch \
    --groupby 'CSQ/gene' \
    -v $input \
    --other_vcf $other_ch_vcf \
    -o $output

slivar ch extracts only variants with kid_ch and checks that parent_ch is present in exactly one parent when grouped by groupby. This requires that the SV VCF has a CSQ that matches that from bcftools or snpEff or VEP.

This can cover use-cases in #138 as the user can specify a --sample-expr 'parent_ch:...' that allows for homozygous variants in the parent.

slivar ch must only check for the parents that one variant is hom-ref and the other is not (don't check het).

hdashnow commented 2 years ago

For 3. consider two SVs/STRs in the same gene as well.

fakedrtom commented 2 years ago

I believe current slivar comphet strategies rely on both of the variants being "damaging" for inclusion, but when dealing with SVs/STRs this might need to be relaxed since many VCFs of these variants do not include such impact predictions. Plus, as an example, while likely not categorized as "damaging", an intronic SV combined with a "damaging" SNV might be a combination worth considering.

brentp commented 2 years ago

Current slivar compound-hets is actually agnostic to lenient about impact, you can set --skip to empty to include all possible types, and then you could include pairs of intronic variants if you so choose. The logic for SVs will be such that any included SV present in the sample will be considered.