Open jxchong opened 9 years ago
Is this just checking if adjacent lines in the output from the comp_het tool overlap?
No. This would be doing comp_het filtering combining CNVs from an alternate data source (presumably a BED file?) and variants from a vcf.
I notice that currently, the comp_het_id is not unique to a comphet pair since it includes the variant id. if it were unique to a pair, you could do something roughly like:
bedtools groupby -i comp_hets.bed -c 1,2,3 -g $comp_het_id_col -ops first,min,max \
| bedtools intersect -a - -b $CNV
if I understand correctly.
No, I meant to look for compound het pairs where one allele IS the CNV and the other allele is a SNP/small indel from the vcf.
Some examples: 1) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4104741/ 2) http://www.nature.com/nature/journal/vaop/ncurrent/fig_tab/nature13394_SF7.html 3) Imagine a person who has both one of the described heterozygous carrier CNVs and then a normal small mutation from the other parent: http://genome.cshlp.org/content/early/2013/05/16/gr.156075.113.abstract
Need a way to do compound het/homozygous filtering that would identify genes for which the two variants in question are an overlapping CNV and a SNV/indel. Perhaps inputting CNVs using a BED+ file?
A number of considerations (brainstorming here, might be missing some possibilities): compound het: duplication CNV + het for an SNV/indel deletion CNV + het for an SNV/indel (non-overlapping with the CNV -- maybe when CNV covers only part of the gene?)
homozygous recessive: het for a deletion CNV + "homozygous" (would be called this way although really hemizygous) for an SNV/indel