arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 118 forks source link

Filtering overlapping CNVs and SNVs/indels #470

Open jxchong opened 9 years ago

jxchong commented 9 years ago

Need a way to do compound het/homozygous filtering that would identify genes for which the two variants in question are an overlapping CNV and a SNV/indel. Perhaps inputting CNVs using a BED+ file?

A number of considerations (brainstorming here, might be missing some possibilities): compound het: duplication CNV + het for an SNV/indel deletion CNV + het for an SNV/indel (non-overlapping with the CNV -- maybe when CNV covers only part of the gene?)

homozygous recessive: het for a deletion CNV + "homozygous" (would be called this way although really hemizygous) for an SNV/indel

brentp commented 9 years ago

Is this just checking if adjacent lines in the output from the comp_het tool overlap?

jxchong commented 9 years ago

No. This would be doing comp_het filtering combining CNVs from an alternate data source (presumably a BED file?) and variants from a vcf.

brentp commented 9 years ago

I notice that currently, the comp_het_id is not unique to a comphet pair since it includes the variant id. if it were unique to a pair, you could do something roughly like:

bedtools groupby -i comp_hets.bed -c 1,2,3 -g $comp_het_id_col -ops first,min,max \
    | bedtools intersect -a - -b $CNV

if I understand correctly.

jxchong commented 9 years ago

No, I meant to look for compound het pairs where one allele IS the CNV and the other allele is a SNP/small indel from the vcf.

jxchong commented 9 years ago

Some examples: 1) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4104741/ 2) http://www.nature.com/nature/journal/vaop/ncurrent/fig_tab/nature13394_SF7.html 3) Imagine a person who has both one of the described heterozygous carrier CNVs and then a normal small mutation from the other parent: http://genome.cshlp.org/content/early/2013/05/16/gr.156075.113.abstract