Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
455 stars 152 forks source link

How to prioritize MANE PLUS CLINICAL isoforms? #1443

Closed Heredia-Maria closed 1 year ago

Heredia-Maria commented 1 year ago

Dear Ensembl team: I came across with an issue during the analysis of my otput. I would like to prioritize variants in MANE PLUS CLINICAL transcrips when available, in front of MANE isoforms. I have seen the flag pick_order in your vep documentation. However, I can not see any specific way to prioritize this PLUS CLINICAL field. Valid criteria are: [ canonical appris tsl biotype ccds rank length mane ]. e.g.:

I would like to know if is there any way, please

Thanks in advance. Kind regards,

María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Thank you for your query.

Please can you confirm what release version of VEP you are using and the assembly?

Thank you very much, Ola.

Heredia-Maria commented 1 year ago

Hi @olaaustine.

I'm using release 109 and GRCh38. Thanks for your quick response.

María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Thank you for your response

From release_109, you can prioritize mane_plus_clinical, by changing the pick_order.

According to our documentation here, --pick_order can be used to customize the criteria when using VEP

--pick --pick_order mane_plus_clinical,mane_select,canonical,appris,tsl,biotype,ccds,rank,length 

Let us know if this solves the problem. Thank you Ola.

Heredia-Maria commented 1 year ago

Hi Ola. I will try as soon as possible and give you feedback.

Thank you María.

Heredia-Maria commented 1 year ago

Hi Ola.

The prioritization is now working for most genes! However, I still have some issues. For example, in the case of NOX1 "ENSG000007952", which have two gold isoforms in Ensembl, the one prioritized is not the gold MANE isoform, but the other one. Why is this happening? Finally, my main objective would be to annotate the variants on the same sequences that are collected in UniProt, but I am not succeeding.

Thank you very much María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Thank you very much for letting us know.

To better investigate the issue, what transcripts are the two gold isoforms in Ensembl?

The gene NOX1 in the example has the transcript ENST00000372966.

Thank you very much Ola.

Heredia-Maria commented 1 year ago

Hi @olaaustine,

The two gold isoforms in Ensemble are: ENST00000372966.8 (MANE) and ENST00000217885.5. The isoforms on which my variants have been annotated are: ENST00000217885 (16 variants) and ENST00000372960 (2 variants). Not the MANE Select here...

Thank you very much. María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Thank you very much for your response.

To try to recreate the issue, please can you share the command used and the example variant in this case if thats possible

Thank you very much Ola.

Heredia-Maria commented 1 year ago

Hi @olaaustine

The command used is the following: ./vep --offline --cache --fa Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --format vcf --tab -i ./../Escritorio/DataBasesData/clinvar_20230213.vcf --show_ref_allele --total_length --mane --verbose --variant_class --force_overwrite --hgvs --symbol --uniprot --gencode_basic --canonical --biotype --exclude_predicted --no_intergenic --protein --shift_3prime 1 --pick_order biotype,ccds,rank --pick --custom clinvar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNDN -o NMD_20230213_output.txt -plugin Downstream -plugin ProteinSeqs,references.fa,mutated.fa -plugin NMD

The input file is the whole bunch of variants downloaded from ClinVar. I attach here an vcf with the variants just for NOX1. NOX1.txt

Thank you very much, María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Looking at your command, the --pick_order should be --pick_order mane_plus_clinical,mane_select,canonical,appris,tsl,biotype,ccds,rank,length which priorities mane.

More about the gold isoforms in Ensembl can be seen here Please let us know if this helps. Thank you Ola

Heredia-Maria commented 1 year ago

Hi @olaaustine,

I copied the older version of my command here, but I did it correctly on the terminal:

./vep --offline --cache --fa Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --format vcf --tab -i ./../Escritorio/DataBasesData/clinvar_20230213.vcf --show_ref_allele --total_length --mane --verbose --variant_class --force_overwrite --hgvs --symbol --uniprot --gencode_basic --canonical --biotype --exclude_predicted --no_intergenic --protein --shift_3prime 1 --pick --pick_order mane_plus_clinical,mane_select,canonical,appris,tsl,ccds,rank,length --custom clinvar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNDN -o NMD_20230213_output_prioritization.txt -plugin Downstream -plugin ProteinSeqs,references.fa,mutated.fa -plugin NMD

I noted I forgot the "biotype" flag, I'm going to repeat the process with this flag and let's see...

Sorry for the inconvenience, Thank you, María

Heredia-Maria commented 1 year ago

Hi @olaaustine

I was also working with the old output!! It is everything working properly, even without the biotype flag. Anyways I will try to use it and compare both outputs.

Thank you so much for your help and your patience. Kind regards, María.

olaaustine commented 1 year ago

Hi @Heredia-Maria,

Thank you very much for letting us know.

I will close this ticket now. Please feel free to open another ticket or reopen this if you have another query.

Thank you. Ola.