Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 149 forks source link

Fix SVTYPE when using IUPAC nucleotide codes #1636

Closed nuno-agostinho closed 2 months ago

nuno-agostinho commented 3 months ago

Fixes #1631

Motivation

When the SVTYPE tag is defined and the variant REF/ALT alleles contain non-ATCG IUPAC nucleotide codes (such as N and R in the user's example), VEP 111 will try to parse the ALT allele as a SVTYPE and fail:

WARNING: line 1 skipped (3 90699772 MantaINS:2:34291:34291:1:0:0 ANNNNN...): ATCACAAATAGGTTCTGAGAATTATTCTGTCTAGTTTTTCTAGCGCCGTTTGAGGCCTATGGTAGAAAAGGGAATATCTTCATAGAAAAACGAGACAGAATAATTCTCAGAACCTATTTGTGATTTGTGCTT type is not supported

To avoid this issue, if the SVTYPE type is defined and if ALT does not resemble one of the VCF-supported SV types in ALT (i.e., starting with INS, DEL, INV, DUP or CN), then the SV type will be based on SVTYPE instead of ALT.

The warning message was also changed to be clearer:

WARNING: line 1 skipped (3 90699772 MantaINS:2:34291:34291:1:0:0 ANNNNN...): ANN is not a supported structural variant type

Testing

VEP should run with the following variants without returning any warnings:

3       90699772        MantaINS:2:34291:34291:1:0:0    ANNNNNNNNNNNNNNNNNNNN   ATCACAAATAGGTTCTGAGAATTATTCTGTCTAGTTTTTCTAGCGCCGTTTGAGGCCTATGGTAGAAAAGGGAATATCTTCATAGAAAAACGAGACAGAATAATTCTCAGAACCTATTTGTGATTTGTGCTT    957     PASS    END=90699792;SVTYPE=INS;SVLEN=131;CIGAR=1M131I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:62:999,65,0:0,1:0,25                                                                                                                                                                                                                   
4       31835775        MantaINS:70994:0:0:0:1:0        TNNNNNNNNNNNNNNNNNNNN   TTGCAGTGAAGAGAGATCACGACACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATTTAAATTTAAAAAACCCCACATGAACAAGCTAATAAAGCATACTGAGTTTGATGAAATACATTTCTTTTCT       999     PASS   END=31835795;SVTYPE=INS;SVLEN=176;CIGAR=1M176I20D;set=manta      GT:FT:GQ:PL:PR:SR       1/1:PASS:161:999,164,0:0,18:0,61                                                                                                                                                                
10      39254773        MantaINS:109681:2:2:0:0:0       GNNNNNNNNNNNNNNNNNNNN   GCAGTTTCTCTGAAATCTTCTTTCTAGTTTTTATCTGTAGATGTTTCCTATTTCACCATAGGCCTGAAGGCTCACCAAAGTATCCCTATGCAGATTCTACAAAAACAGTGTTACCAAACTGTTGAATGAAAAGAGAGGTTGAACTCTGTAAGATGAATGGAGACATCATGAAATGGTTTCTCAGATAGCTTCCTTCGAGTTTTTATCCTGAAATATTCCCTTTTGCACCATGACCTCAATGAGCTCGCAAATGTCCAC      999     PASS    END=39254793;SVTYPE=INS;SVLEN=257;CIGAR=1M257I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:150:999,153,0:0,21:0,38                                                                                
17      21860937        MantaINS:2:27285:27285:2:1:0    GNNNNNNNNNNNNNNNNNNNN   GGAATGGAATCGAATGGAATGTAATCAAATGGAATGGACCAGAATGGAATGGAATGGAAAAGAACGGACATGAATGTAATGGACTGCAATCTAACTGATTCGAAAGAATGGAATCGAAAG        999     PASS    END=21860957;SVTYPE=INS;SVLEN=119;CIGAR=1M119I20D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:119:999,122,0:0,6:0,40                                                                                                                                                                                                                         
17      26820065        MantaDEL:2:27595:27595:2:1:0    CNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN      CGA     999    MaxMQ0Frac       END=26820266;SVTYPE=DEL;SVLEN=-201;CIGAR=1M2I201D;set=manta     GT:FT:GQ:PL:PR:SR       1/1:PASS:85:999,88,0:1,3:0,30                                                                                                                                                   

The output should return that these are all intergenic variants (instead of no consequence at all).

olaaustine commented 2 months ago

Merged into release and main