Closed damiansm closed 3 years ago
Potentially useful resources to get started with
A copy number variation map of the human genome: https://www.nature.com/articles/nrg3871
CODEX: a normalization and copy number variation detection method for whole exome sequencing: https://www.ncbi.nlm.nih.gov/pubmed/25618849
Identification of copy number variants in whole-genome data using Reference Coverage Profiles. https://www.ncbi.nlm.nih.gov/pubmed/25741365
dbVar: https://www.ncbi.nlm.nih.gov/dbvar/ DGV: http://dgv.tcag.ca/dgv/app/home DGVa: https://www.ebi.ac.uk/dgva
Varsome has a SV browser too https://varsome.com/variant/hg19/9-101594229-G-A
New GeL CNV tiering pipeline
All CANVAS CNVs in proband (Frequency and Inheritance ignored for now) > 10kb are tiered into tier A (overlaps a gene or region defined in the applied panels) or tier null (all others). Left up to the user to check if CNV makes sense as a diagnosis alone or in combo with one of the SNVs
See also gnomAD-SV publication and the blog post
Will at least require an end position to be added to VariantEvaluation, however complex SV requires more, so this needs consideration.
ID is no longer an RsId so will need to change this so that either RsId class is altered to allow any ID or VariantEvaluation is only used for small variations and a new StructuralVariantEvaluation added.
$ tabix gnomad_v2_sv.sites.vcf.gz -h 1:10000-20000
##fileformat=VCFv4.2
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##ALT=<ID=BND,Description="Translocation">
##ALT=<ID=CPX,Description="Complex SV">
##ALT=<ID=CTX,Description="Reciprocal chromosomal translocation">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INS,Description="Insertion">
##ALT=<ID=INS:ME,Description="Mobile element insertion of unspecified ME class">
##ALT=<ID=INS:ME:ALU,Description="Alu element insertion">
##ALT=<ID=INS:ME:LINE1,Description="LINE1 element insertion">
##ALT=<ID=INS:ME:SVA,Description="SVA element insertion">
##ALT=<ID=INS:UNK,Description="Sequence insertion of unspecified origin">
##ALT=<ID=INV,Description="Inversion">
##FILTER=<ID=MULTIALLELIC,Description="Multiallelic site">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=PCRPLUS_ENRICHED,Description="Site enriched for non-reference genotypes among PCR+ samples. Likely reflects technical batch effects. All PCR- samples have been assigned null GTs for these sites.>">
##FILTER=<ID=PREDICTED_GENOTYPING_ARTIFACT,Description="Site is predicted to be a genotyping false-positive based on analysis of minimum GQs prior to GQ filtering.">
##FILTER=<ID=UNRESOLVED,Description="Variant is unresolved">
##FILTER=<ID=VARIABLE_ACROSS_BATCHES,Description="Site appears at variable frequencies across batches. Likely reflects technical batch effects.>">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Number of non-reference alleles observed (for biallelic sites) or individuals at each copy state (for multiallelic sites).">
##INFO=<ID=AFR_AC,Number=A,Type=Integer,Description="Number of non-reference AFR alleles observed (for biallelic sites) or AFR individuals at each copy state (for multiallelic sites).">
##INFO=<ID=AFR_AF,Number=A,Type=Float,Description="AFR allele frequency (for biallelic sites) or AFR copy-state frequency (for multiallelic sites).">
##INFO=<ID=AFR_AN,Number=1,Type=Integer,Description="Total number of AFR alleles genotyped (for biallelic sites) or AFR individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=AFR_FREQ_HET,Number=1,Type=Float,Description="AFR heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=AFR_FREQ_HOMALT,Number=1,Type=Float,Description="AFR homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=AFR_FREQ_HOMREF,Number=1,Type=Float,Description="AFR homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=AFR_N_BI_GENOS,Number=1,Type=Integer,Description="Total number of AFR individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=AFR_N_HET,Number=1,Type=Integer,Description="Number of AFR individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=AFR_N_HOMALT,Number=1,Type=Integer,Description="Number of AFR individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=AFR_N_HOMREF,Number=1,Type=Integer,Description="Number of AFR individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency (for biallelic sites) or copy-state frequency (for multiallelic sites).">
##INFO=<ID=ALGORITHMS,Number=.,Type=String,Description="Source algorithms">
##INFO=<ID=AMR_AC,Number=A,Type=Integer,Description="Number of non-reference AMR alleles observed (for biallelic sites) or AMR individuals at each copy state (for multiallelic sites).">
##INFO=<ID=AMR_AF,Number=A,Type=Float,Description="AMR allele frequency (for biallelic sites) or AMR copy-state frequency (for multiallelic sites).">
##INFO=<ID=AMR_AN,Number=1,Type=Integer,Description="Total number of AMR alleles genotyped (for biallelic sites) or AMR individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=AMR_FREQ_HET,Number=1,Type=Float,Description="AMR heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=AMR_FREQ_HOMALT,Number=1,Type=Float,Description="AMR homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=AMR_FREQ_HOMREF,Number=1,Type=Float,Description="AMR homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=AMR_N_BI_GENOS,Number=1,Type=Integer,Description="Total number of AMR individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=AMR_N_HET,Number=1,Type=Integer,Description="Number of AMR individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=AMR_N_HOMALT,Number=1,Type=Integer,Description="Number of AMR individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=AMR_N_HOMREF,Number=1,Type=Integer,Description="Number of AMR individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles genotyped (for biallelic sites) or individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate">
##INFO=<ID=CPX_INTERVALS,Number=.,Type=String,Description="Genomic intervals constituting complex variant.">
##INFO=<ID=CPX_TYPE,Number=1,Type=String,Description="Class of complex variant.">
##INFO=<ID=EAS_AC,Number=A,Type=Integer,Description="Number of non-reference EAS alleles observed (for biallelic sites) or EAS individuals at each copy state (for multiallelic sites).">
##INFO=<ID=EAS_AF,Number=A,Type=Float,Description="EAS allele frequency (for biallelic sites) or EAS copy-state frequency (for multiallelic sites).">
##INFO=<ID=EAS_AN,Number=1,Type=Integer,Description="Total number of EAS alleles genotyped (for biallelic sites) or EAS individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=EAS_FREQ_HET,Number=1,Type=Float,Description="EAS heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=EAS_FREQ_HOMALT,Number=1,Type=Float,Description="EAS homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=EAS_FREQ_HOMREF,Number=1,Type=Float,Description="EAS homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=EAS_N_BI_GENOS,Number=1,Type=Integer,Description="Total number of EAS individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=EAS_N_HET,Number=1,Type=Integer,Description="Number of EAS individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=EAS_N_HOMALT,Number=1,Type=Integer,Description="Number of EAS individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=EAS_N_HOMREF,Number=1,Type=Integer,Description="Number of EAS individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=EUR_AC,Number=A,Type=Integer,Description="Number of non-reference EUR alleles observed (for biallelic sites) or EUR individuals at each copy state (for multiallelic sites).">
##INFO=<ID=EUR_AF,Number=A,Type=Float,Description="EUR allele frequency (for biallelic sites) or EUR copy-state frequency (for multiallelic sites).">
##INFO=<ID=EUR_AN,Number=1,Type=Integer,Description="Total number of EUR alleles genotyped (for biallelic sites) or EUR individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=EUR_FREQ_HET,Number=1,Type=Float,Description="EUR heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=EUR_FREQ_HOMALT,Number=1,Type=Float,Description="EUR homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=EUR_FREQ_HOMREF,Number=1,Type=Float,Description="EUR homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=EUR_N_BI_GENOS,Number=1,Type=Integer,Description="Total number of EUR individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=EUR_N_HET,Number=1,Type=Integer,Description="Number of EUR individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=EUR_N_HOMALT,Number=1,Type=Integer,Description="Number of EUR individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=EUR_N_HOMREF,Number=1,Type=Integer,Description="Number of EUR individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=EVIDENCE,Number=.,Type=String,Description="Classes of random forest support.">
##INFO=<ID=FREQ_HET,Number=1,Type=Float,Description="Heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=FREQ_HOMALT,Number=1,Type=Float,Description="Homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=FREQ_HOMREF,Number=1,Type=Float,Description="Homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=N_BI_GENOS,Number=1,Type=Integer,Description="Total number of individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=N_HET,Number=1,Type=Integer,Description="Number of individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=N_HOMALT,Number=1,Type=Integer,Description="Number of individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=N_HOMREF,Number=1,Type=Integer,Description="Number of individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=OTH_AC,Number=A,Type=Integer,Description="Number of non-reference OTH alleles observed (for biallelic sites) or OTH individuals at each copy state (for multiallelic sites).">
##INFO=<ID=OTH_AF,Number=A,Type=Float,Description="OTH allele frequency (for biallelic sites) or OTH copy-state frequency (for multiallelic sites).">
##INFO=<ID=OTH_AN,Number=1,Type=Integer,Description="Total number of OTH alleles genotyped (for biallelic sites) or OTH individuals with copy-state estimates (for multiallelic sites).">
##INFO=<ID=OTH_FREQ_HET,Number=1,Type=Float,Description="OTH heterozygous genotype frequency (biallelic sites only).">
##INFO=<ID=OTH_FREQ_HOMALT,Number=1,Type=Float,Description="OTH homozygous alternate genotype frequency (biallelic sites only).">
##INFO=<ID=OTH_FREQ_HOMREF,Number=1,Type=Float,Description="OTH homozygous reference genotype frequency (biallelic sites only).">
##INFO=<ID=OTH_N_BI_GENOS,Number=1,Type=Integer,Description="Total number of OTH individuals with complete genotypes (biallelic sites only).">
##INFO=<ID=OTH_N_HET,Number=1,Type=Integer,Description="Number of OTH individuals with heterozygous genotypes (biallelic sites only).">
##INFO=<ID=OTH_N_HOMALT,Number=1,Type=Integer,Description="Number of OTH individuals with homozygous alternate genotypes (biallelic sites only).">
##INFO=<ID=OTH_N_HOMREF,Number=1,Type=Integer,Description="Number of OTH individuals with homozygous reference genotypes (biallelic sites only).">
##INFO=<ID=PCRPLUS_DEPLETED,Number=0,Type=Flag,Description="Site depleted for non-reference genotypes among PCR+ samples. Likely reflects technical batch effects. All PCR+ samples have been assigned null GTs for these sites.">
##INFO=<ID=PESR_GT_OVERDISPERSION,Number=0,Type=Flag,Description="PESR genotyping data is overdispersed. Flags sites where genotypes are likely noisier.">
##INFO=<ID=POPMAX_AF,Number=1,Type=Float,Description="Maximum allele frequency across any population (biallelic sites only).">
##INFO=<ID=PROTEIN_CODING__COPY_GAIN,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a copy-gain effect.">
##INFO=<ID=PROTEIN_CODING__DUP_LOF,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a loss-of-function effect via intragenic exonic duplication.">
##INFO=<ID=PROTEIN_CODING__DUP_PARTIAL,Number=.,Type=String,Description="Gene(s) which are partially overlapped by an SV's duplication, such that an unaltered copy is preserved.">
##INFO=<ID=PROTEIN_CODING__INTERGENIC,Number=0,Type=Flag,Description="SV does not overlap coding sequence.">
##INFO=<ID=PROTEIN_CODING__INTRONIC,Number=.,Type=String,Description="Gene(s) where the SV was found to lie entirely within an intron.">
##INFO=<ID=PROTEIN_CODING__INV_SPAN,Number=.,Type=String,Description="Gene(s) which are entirely spanned by an SV's inversion.">
##INFO=<ID=PROTEIN_CODING__LOF,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a loss-of-function effect.">
##INFO=<ID=PROTEIN_CODING__MSV_EXON_OVR,Number=.,Type=String,Description="Gene(s) on which the multiallelic SV would be predicted to have a LOF, DUP_LOF, COPY_GAIN, or DUP_PARTIAL annotation if the SV were biallelic.">
##INFO=<ID=PROTEIN_CODING__NEAREST_TSS,Number=.,Type=String,Description="Nearest transcription start site to intragenic variants.">
##INFO=<ID=PROTEIN_CODING__PROMOTER,Number=.,Type=String,Description="Genes whose promoter sequence (1 kb) was disrupted by SV.">
##INFO=<ID=PROTEIN_CODING__UTR,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to disrupt a UTR.">
##INFO=<ID=SOURCE,Number=1,Type=String,Description="Source of inserted sequence.">
##INFO=<ID=STRANDS,Number=1,Type=String,Description="Breakpoint strandedness [++,+-,-+,--]">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=UNRESOLVED_TYPE,Number=1,Type=String,Description="Class of unresolved variant.">
##CPX_TYPE_CCR="Complex chromosomal rearrangement, involving two or more chromosomes and multiple SV signatures."
##CPX_TYPE_INS_iDEL="Insertion with deletion at insertion site."
##CPX_TYPE_INVdel="Complex inversion with 3' flanking deletion."
##CPX_TYPE_INVdup="Complex inversion with 3' flanking duplication."
##CPX_TYPE_dDUP="Dispersed duplication."
##CPX_TYPE_dDUP_iDEL="Dispersed duplication with deletion at insertion site."
##CPX_TYPE_delINVdel="Complex inversion with 5' and 3' flanking deletions."
##CPX_TYPE_delINVdup="Complex inversion with 5' flanking deletion and 3' flanking duplication."
##CPX_TYPE_delINV="Complex inversion with 5' flanking deletion."
##CPX_TYPE_dupINVdel="Complex inversion with 5' flanking duplication and 3' flanking deletion."
##CPX_TYPE_dupINVdup="Complex inversion with 5' and 3' flanking duplications."
##CPX_TYPE_dupINV="Complex inversion with 5' flanking duplication."
##CPX_TYPE_piDUP_FR="Palindromic inverted tandem duplication, forward-reverse orientation."
##CPX_TYPE_piDUP_RF="Palindromic inverted tandem duplication, reverse-forward orientation."
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10000 gnomAD_v2_DUP_1_1 N <DUP> 999 PASS END=20000;SVTYPE=DUP;CHR2=1;SVLEN=10000;ALGORITHMS=depth;EVIDENCE=BAF,RD;PROTEIN_CODING__NEAREST_TSS=OR4F5;PROTEIN_CODING__INTERGENIC;AN=21474;AC=20175;AF=0.939508;N_BI_GENOS=10737;N_HOMREF=41;N_HET=1217;N_HOMALT=9479;FREQ_HOMREF=0.00381857;FREQ_HET=0.113346;FREQ_HOMALT=0.882835;AFR_AN=9480;AFR_AC=9060;AFR_AF=0.955696;AFR_N_BI_GENOS=4740;AFR_N_HOMREF=19;AFR_N_HET=382;AFR_N_HOMALT=4339;AFR_FREQ_HOMREF=0.00400844;AFR_FREQ_HET=0.0805907;AFR_FREQ_HOMALT=0.915401;AMR_AN=1784;AMR_AC=1614;AMR_AF=0.904709;AMR_N_BI_GENOS=892;AMR_N_HOMREF=7;AMR_N_HET=156;AMR_N_HOMALT=729;AMR_FREQ_HOMREF=0.00784753;AMR_FREQ_HET=0.174888;AMR_FREQ_HOMALT=0.817265;EAS_AN=2224;EAS_AC=2018;EAS_AF=0.907374;EAS_N_BI_GENOS=1112;EAS_N_HOMREF=9;EAS_N_HET=188;EAS_N_HOMALT=915;EAS_FREQ_HOMREF=0.00809352;EAS_FREQ_HET=0.169065;EAS_FREQ_HOMALT=0.822842;EUR_AN=7598;EUR_AC=7126;EUR_AF=0.937878;EUR_N_BI_GENOS=3799;EUR_N_HOMREF=6;EUR_N_HET=460;EUR_N_HOMALT=3333;EUR_FREQ_HOMREF=0.00157936;EUR_FREQ_HET=0.121084;EUR_FREQ_HOMALT=0.877336;OTH_AN=388;OTH_AC=357;OTH_AF=0.920103;OTH_N_BI_GENOS=194;OTH_N_HOMREF=0;OTH_N_HET=31;OTH_N_HOMALT=163;OTH_FREQ_HOMREF=0;OTH_FREQ_HET=0.159794;OTH_FREQ_HOMALT=0.840206;POPMAX_AF=0.955696
1 10642 gnomAD_v2_BND_1_1 N <BND> 928 UNRESOLVED END=10642;SVTYPE=BND;CHR2=15;SVLEN=-1;ALGORITHMS=manta;EVIDENCE=PE,SR;UNRESOLVED_TYPE=SINGLE_ENDER_--;PESR_GT_OVERDISPERSION;AN=20178;AC=17;AF=0.000843;N_BI_GENOS=10089;N_HOMREF=10077;N_HET=7;N_HOMALT=5;FREQ_HOMREF=0.998811;FREQ_HET=0.000693825;FREQ_HOMALT=0.000495589;AFR_AN=8920;AFR_AC=7;AFR_AF=0.000785;AFR_N_BI_GENOS=4460;AFR_N_HOMREF=4455;AFR_N_HET=3;AFR_N_HOMALT=2;AFR_FREQ_HOMREF=0.998879;AFR_FREQ_HET=0.000672646;AFR_FREQ_HOMALT=0.00044843;AMR_AN=1710;AMR_AC=2;AMR_AF=0.00117;AMR_N_BI_GENOS=855;AMR_N_HOMREF=854;AMR_N_HET=0;AMR_N_HOMALT=1;AMR_FREQ_HOMREF=0.99883;AMR_FREQ_HET=0;AMR_FREQ_HOMALT=0.00116959;EAS_AN=2006;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=1003;EAS_N_HOMREF=1003;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=1;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=7180;EUR_AC=8;EUR_AF=0.001114;EUR_N_BI_GENOS=3590;EUR_N_HOMREF=3584;EUR_N_HET=4;EUR_N_HOMALT=2;EUR_FREQ_HOMREF=0.998329;EUR_FREQ_HET=0.00111421;EUR_FREQ_HOMALT=0.000557103;OTH_AN=362;OTH_AC=0;OTH_AF=0;OTH_N_BI_GENOS=181;OTH_N_HOMREF=181;OTH_N_HET=0;OTH_N_HOMALT=0;OTH_FREQ_HOMREF=1;OTH_FREQ_HET=0;OTH_FREQ_HOMALT=0;POPMAX_AF=0.00117
1 14500 gnomAD_v2_DUP_1_2 N <DUP> 49 PCRPLUS_ENRICHED END=43500;SVTYPE=DUP;CHR2=1;SVLEN=29000;ALGORITHMS=depth;EVIDENCE=BAF,RD;PROTEIN_CODING__NEAREST_TSS=OR4F5;PROTEIN_CODING__INTERGENIC;AN=1036;AC=259;AF=0.25;N_BI_GENOS=518;N_HOMREF=308;N_HET=161;N_HOMALT=49;FREQ_HOMREF=0.594595;FREQ_HET=0.310811;FREQ_HOMALT=0.0945946;AFR_AN=424;AFR_AC=177;AFR_AF=0.417453;AFR_N_BI_GENOS=212;AFR_N_HOMREF=77;AFR_N_HET=93;AFR_N_HOMALT=42;AFR_FREQ_HOMREF=0.363208;AFR_FREQ_HET=0.438679;AFR_FREQ_HOMALT=0.198113;AMR_AN=448;AMR_AC=47;AMR_AF=0.104911;AMR_N_BI_GENOS=224;AMR_N_HOMREF=179;AMR_N_HET=43;AMR_N_HOMALT=2;AMR_FREQ_HOMREF=0.799107;AMR_FREQ_HET=0.191964;AMR_FREQ_HOMALT=0.00892857;EAS_AN=0;EAS_AC=0;EAS_AF=0;EAS_N_BI_GENOS=0;EAS_N_HOMREF=0;EAS_N_HET=0;EAS_N_HOMALT=0;EAS_FREQ_HOMREF=0;EAS_FREQ_HET=0;EAS_FREQ_HOMALT=0;EUR_AN=148;EUR_AC=29;EUR_AF=0.195946;EUR_N_BI_GENOS=74;EUR_N_HOMREF=49;EUR_N_HET=21;EUR_N_HOMALT=4;EUR_FREQ_HOMREF=0.662162;EUR_FREQ_HET=0.283784;EUR_FREQ_HOMALT=0.0540541;OTH_AN=16;OTH_AC=6;OTH_AF=0.375;OTH_N_BI_GENOS=8;OTH_N_HOMREF=3;OTH_N_HET=4;OTH_N_HOMALT=1;OTH_FREQ_HOMREF=0.375;OTH_FREQ_HET=0.5;OTH_FREQ_HOMALT=0.125;POPMAX_AF=0.417453
see example_sv vcf and yml for test example
@julesjacobsen Are we ready to release our v1 of (optional) SV prioritisation now?
Think we're ready now!
Would be useful in near future for GEL at least as we will start to get decent CNV calls soon.
In principle it could be pretty simple i.e. any gene deleted by a CNV gets a variant score of 1 and whatever the usual phenotype score is.
However, how to annotate the CNVs in terms of genes, whether they are deleted or duplicated and what to do about partial overlaps would need to be decided. There are existing pipelines including the GEL one we can learn from