Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
138 stars 113 forks source link

VEP NMD plugin #568

Closed jon4thin closed 1 year ago

jon4thin commented 1 year ago

Hello! I have been investigating stop gain variants, using the VEP NMD plugin on VEP v105, GRCh38 to analyze NMD escaping variants. When analyzing "stop_gained" variants from the Clinvar database, I observed a large amount of variant-transcript pairs on negative strand transcripts that are being annotated as "NMD_escaping_variant" that do not appear to fulfill any of the criteria listed on the plugin's webpage: [https://www.ensembl.info/2022/01/28/cool-stuff-ensembl-vep-can-do-flagging-variants-predicted-to-allow-nmd-escape/] Am I missing something here?

Here are a few SNP examples (but the issue also occurs with indels and substitutions): variant-ID transcript 1_100210750_G_A ENST00000370132 4_169406633_C_A ENST00000439128 4_169406633_C_A ENST00000507142 4_169406633_C_A ENST00000510533 4_169406633_C_A ENST00000511633 4_169406633_C_A ENST00000512193 7_5999158_C_A ENST00000265849 7_5999158_C_A ENST00000382321 8_18059639_C_A ENST00000637790 8_18059639_C_A ENST00000381733 8_18059639_C_A ENST00000314146 9_97689592_C_A ENST00000375128 17_7676215_G_A ENST00000413465 17_7676215_G_A ENST00000359597 17_7676215_G_A ENST00000269305 17_7676215_G_A ENST00000445888 22_38145565_G_A ENST00000332509 22_38145565_G_A ENST00000402064

Thank you!

nakib103 commented 1 year ago

Hello @jon4thin ,

Thank for your query!

Unfortunately I could not replicate your issue. Can you provide me the command you have used and the input and output files?

Also, do you face the same issue with latest version of VEP?

Best regards, Nakib

jon4thin commented 1 year ago

Here is my code for running VEP:

#- Running VEP to predict variant impact;
#-  to be used to annotate NMD- variants

set -e
set -u

VEP_CACHE=/condaVEP/  #- /condaVEP/ is where I store my conda , bioconda VEP instillation 
VEP_FASTA=/condaVEP/homo_sapiens/105_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

INPUT=$1
OUTPUT=./vep_results/$INPUT.vep_out.txt

vep     -i $INPUT               \
        -o $OUTPUT              \
        --offline               \
        --dir_cache $VEP_CACHE  \
        --fasta $VEP_FASTA      \
        --no_stats              \
        --coding_only           \
        --shift_hgvs 0          \
        --hgvs                  \
        --tsl                   \
        --symbol                \
        --gencode_basic         \
        --plugin Downstream     \
        --plugin NMD            && gzip $OUTPUT

I have not tried newer versions of VEP. Was it broken in v105 and fixed since?

Also:

#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 105.f357e33
  ensembl-funcgen      : 105.660df8f
  ensembl-io           : 105.2a0a40c
  ensembl-variation    : 105.ac8178e
  ensembl-vep          : 105.0
jon4thin commented 1 year ago

I also wanted to clarify, did none of the variants I provided have the "NMD_escaping_variant" label attached in the "nmd" column of the VEP annotation output when you annotated them with enmsbl VEP GRCh38?

nakib103 commented 1 year ago

Hi @jon4thin ,

Yeah, I tried all the variant example you provided and none of them showed as "NMD_escaping_variant". Can you try downloading the latest VEP version and try with that.

Best regards, Nakib

nakib103 commented 1 year ago

I am closing this issue. If you face further problem please comment here or open a new issue.

jon4thin commented 1 year ago

Sorry for the late response. With the new version, the issue is diminished but still present (effecting roughly 2% of the variants I analyzed).

Here is a couple random examples: 9_21971038_GC_AA 21_46137006_A_T 9_6534735_A_AT
4_5565268_TC_AA 13_51935666_G_A 16_23529614_G_A

olaaustine commented 1 year ago

Hi @jon4thin, Thank you for your query. There are 4 criteria for which only one needs to be met for a variant to be annotated as NMD_escaping_variant. Using some examples shared above, the variant meets at least one of these criteria. Are you using the latest version? Thank you Ola.

jon4thin commented 1 year ago

I am using:

Versions: ensembl : 108.d8a9c80 ensembl-funcgen : 108.56bb136 ensembl-io : 108.58d13c1 ensembl-variation : 108.a885ada ensembl-vep : 108.2

I am using cache_version 105 for the transcription build to maintain the ability to compare with other sources I am using for analysis.

The first 4, VEP does not annotate as NMD escaping and the last two VEP annotates as NMD escaping but I cannot identify for what reason it calls these 2 variants as nmd escaping ( 13_51935666_G_A in ENST00000448424 and 16_23529614_G_A in ENST00000563232).

olaaustine commented 1 year ago

Hi @jon4thin, Thank you for bringing this to our notice. We suggest that you use our latest version of ensembl-vep and the vep_plugin as the latest version does not annotate 16_23529614_G_A in ENST00000563232 as NMD escaping variant. In the meantime we have added a fix which would be in our new release in the summer for edge cases such as 13_51935666_G_A in ENST00000448424 as it should be annotated as NMD_escaping variant. Thank you Ola.