Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
132 stars 114 forks source link

Plugin Conservation cannot annote GERP score for INDEL correctly #678

Closed jxcao98 closed 5 months ago

jxcao98 commented 6 months ago

Dear VEP developer,

I try to annotate the gerp scores for VCF files by using the plugin Conservation.pm. It works fine for SNV but seems wrong for INDEL.

If I understand correctly, for a small deletion, VEP will return the average scores of each position. However, in my practice, I got the score of the first position, not the average. For example, for an indel chr10:73816182 GCTT>G (GRCh38, rs1448086996), Conservation.pm returned the GERP score of -7.34, which was equal to the score of the 73816183 position. (The scores of CTT are -7.340000152587891, 0.0997999981045723, and -0.2709999978542328, respectively.)

I also found Conservation.pm cannot handle Insertion because all this type of mutation in my input files had a n/a value.

The vep version I used was VEP v109, the scripts are like this: --plugin Conservation,$PluginDataPath/conservation/gerp_conservation_scores.homo_sapiens.GRCh38.bw

Thanks in advance, Jixin

jamie-m-a commented 6 months ago

Hi @jxcao98

Yes you are understanding the intended functionality of the plugin correctly. I can reproduce the issue you have encountered and traced it back to the parsing of the scores from the bigwig file. I'll raise a ticket for this to be fixed in an upcoming release of Ensembl.

Meanwhile, you can use the following option of the plugin which takes data from the compara database rather than a bigwig file:

./vep -i variations.vcf --plugin Conservation,database,GERP_CONSERVATION_SCORE,mammals

I believe this method is working correctly to average the scores, though it will be noticeably slower than the file option.

jxcao98 commented 6 months ago

Thanks for your quick reply!

I have another small suggestion. In our variant priorization pipeline, do we care more about the maximum value of all positions than the average? For example, for a long deletion, it deserves to be prioritized if it spans any of the conserved bases.

I‘d like to have an option that returns the maximum score for the region spanned by INDELs. Do you think this is worthwhile?

jamie-m-a commented 6 months ago

I agree Jixin - I think it makes perfect sense to have the most impactful reported as an option. I'll add that to bug fix.

jxcao98 commented 5 months ago

Many thanks! I look forward to the new release!