Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
138 stars 115 forks source link

ProteinSeqs plugin truncating protein sequences #155

Closed silvia-s closed 5 years ago

silvia-s commented 5 years ago

Hi there,

I have a VCF file containing the following variant:

chr1 44130707 . CCAA C

...resulting in the inframe_deletion p.Asn378del. Using the ProteinSeqs plugin, this is the sequence I got for the wildtype:

ENSP00000361373 MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQNVNEGGPQFSMPLPERGMSYCPQATLTPSRMIYCQRMSPPQQEMTIFSGPQLMPVGEPNIPRVARPFGGNLRMPPNGLPVSASTGIPIMSHTGNPPVPYPGLSTVPSDETLLGPTVPSTEAQAVLPSMAQMLPPQDAHDLGMPPAESQSLLVLGSQDSLVSQPDSQEGPFLPEQPGPAPQTVEKNSRPQEGTGRRGSSEARPYCCNYENCGKAYTKRSHLVSHQRKHTGERPYSCNWESCSWSFFRSDELRRHMRVHTRYRPYKCDQCSREFMRSDHLKQHQKTHRPGPSDPQANNNNGEQDSPPAAGP

...and this is the protein sequence for the mutant:

ENSP00000361373.3:p.Asn378del MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQ

As it is not a stop_gain mutation, I expected the protein sequence to skip just that missing "N" aminoacid, rather than being truncated. Am I wrong or is this a bug?

Thank you for your help with this! Silvia

ima23 commented 5 years ago

Hi Silvia,

Your expectation of the output is correct, the ProteinSeqs plugin will report a sequence missing one 'N' aminoacid. I could not replicate your observed output using code and cache 95 and the result seems as expected.

>ENSP00000361373.3:p.Asn378del
MYGRPQAEMEQEAGELSRWQAAHQAAQDNENSAPILNMSSSSGSSGVHTSWNQGLPSIQHFPHSAEMLGSPLVSVEAPGQ
NVNEGGPQFSMPLPERGMSYCPQATLTPSRMIYCQRMSPPQQEMTIFSGPQLMPVGEPNIPRVARPFGGNLRMPPNGLPV
SASTGIPIMSHTGNPPVPYPGLSTVPSDETLLGPTVPSTEAQAVLPSMAQMLPPQDAHDLGMPPAESQSLLVLGSQDSLV
SQPDSQEGPFLPEQPGPAPQTVEKNSRPQEGTGRRGSSEARPYCCNYENCGKAYTKRSHLVSHQRKHTGERPYSCNWESC
SWSFFRSDELRRHMRVHTRYRPYKCDQCSREFMRSDHLKQHQKTHRPGPSDPQANNNGEQDSPPAAGP

Could you please let me now what version of code you are using and if you are using the database or the cache and I can look further into it.

Thanks, Irina

silvia-s commented 5 years ago

Thanks Irina for the very quick reply. My bad: I just realised that, although I was using the up-to-date VEP cache (version 95), the version of the plugins (located in a different path) was pretty old. After replacing them with the latest version, I re-run my command and the output protein sequence was as expected.

Thank you again! Silvia