Open Xi-Cao opened 4 days ago
Sorry for presenting the results unclearly, here are the revisions:
case1:
Uploaded_variation Location Allele Gene Feature
18 rs56222534 1:21863905 A ENSG00000162551 ENST00000374840
19 rs56222534 1:21863905 C ENSG00000162551 ENST00000374840
case2:
Uploaded_variation Location Allele Gene Feature
13 rs1256328 1:21896767 T CCDS217.1 CCDS217.1
14 rs1256328 1:21896767 T CCDS53274.1 CCDS53274.1
15 rs1256328 1:21896767 T CCDS53275.1 CCDS53275.1
16 rs1256328 1:21896767 T ENSG00000162551 ENST00000374840
17 rs1256328 1:21896767 T 249 NM_001369803.2
case3:
Uploaded_variation Location Allele Gene Feature Feature_type SYMBOL
8 rs1256332 1:21893344 A CCDS217.1 CCDS217.1 Transcript -
9 rs1256332 1:21893344 A CCDS53274.1 CCDS53274.1 Transcript -
10 rs1256332 1:21893344 A CCDS53275.1 CCDS53275.1 Transcript -
11 rs1256332 1:21893344 A ENSG00000162551 ENST00000374840 Transcript ALPL
12 rs1256332 1:21893344 A 249 NM_001369803.2 Transcript ALPL
Can you please send a link to the output file? CCDS IDs are not supposed to be in the gene and feature columns.
Which cache file did you download? From the results it looks like you run vep with the --merged cache because there are RefSeq transcripts in the output (example: NM_001369803.2). However, your VEP command is using the ensembl cache (default).
For case1:
rs56222534 (check variant page) has two alternative alleles A
and C
.
VEP returns annotation for each of the alternative alleles in different rows.
case2 and case3 should not have multiple rows if you use the ensembl cache homo_sapiens_vep_113_GRCh37.tar.gz.
Thanks for your reply!
I did download the homo_sapiens_merged_vep_113_GRCh37.tar.gz
for cache. So Is the homo_sapiens_vep_113_GRCh37.tar.gz
a more suitable option as the cache? Will using the default --cache
command with a merged cache file affect the results? I didn't seem to receive any warnings or errors.
I'll try again with the ensembl cache homo_sapiens_vep_113_GRCh37.tar.gz
. And the variant in case1 did have two alternative alleles. Attached is my annotation results. The filename is slightly different from the command because I modified it.
Thanks again, xicao
I didn't mean to imply that the merged cache is incorrect. I was simply trying to understand which cache was being used, as it’s not immediately clear from the VEP command.
If you want to run with homo_sapiens_merged_vep_113_GRCh37.tar.gz
, your output will include both Ensembl and RefSeq transcripts.
This explains why in case 2 you have the following:
16 rs1256328 1:21896767 T ENSG00000162551 ENST00000374840
17 rs1256328 1:21896767 T 249 NM_001369803.2
and case 3:
11 rs1256332 1:21893344 A ENSG00000162551 ENST00000374840 Transcript ALPL
12 rs1256332 1:21893344 A 249 NM_001369803.2 Transcript ALPL
Thank you for sending the output file. From this file, I can see that you run the following command:
vep
--assembly GRCh37
--cache
--cache_version 113
--everything
--fasta [PATH]/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
--force_overwrite
--fork 4
--input_file [PATH]/vep_snplist
--no_stats
--output_file STDOUT
--plugin CADD,snv=[PATH]/whole_genome_SNVs.tsv.gz,indels=[PATH]/gnomad.genomes-exomes.r4.0.indel.tsv.gz
--tab
Can you please re-run vep with the following command:
vep
--assembly GRCh37
--cache
--cache_version 113
--everything
--merged
--fasta [PATH]/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
--force_overwrite
--input_file [PATH]/vep_snplist
--output_file output.txt
--tab
If vep_snplist
is too big, please run a subset of the file.
After re-running vep, do you still have CCDS transcripts in the gene column?
Thanks a lot for your suggestion.
I ran vep with the your command for the first 10 variants, and the CCDS transcripts disappeared from my results (output_filter.txt
). It retained only Ensembl and RefSeq transcripts, as you said. Considering the two similar commands, did the--plugin
option or the omission of --merged
cause the additional transcripts?
output_filter.txt
rs1256332 1:21893344 A ENSG00000162551 ENST00000374840 Transcript intron_variant
rs1256332 1:21893344 A 249 NM_001369803.2 Transcript intron_variant
Then I ran again with the homo_sapiens_vep_113_GRCh37.tar.gz
cache file, deleting --merged
command. It worked well and included only the Ensembl transcripts.
output1_filter.txt
Thanks, xicao
I'm glad it worked!
We don't have any report indicating that the --plugin
or --merged
options interfere with the output that way.
Can you please try with --plugin
?
Thanks~
Following your suggestion, I re-ran VEP with the --merged
and --plugin
options added, respectively. The CCDS transcripts did not appear on either occasion.
Additionally, it seems that the intergenic variants and regulatory-region variants were not annotated to any gene in all results. Is there a command I can use to map these variants to the closest gene through VEP?
Best, xicao
You mean the output of the vep command or the filter_vep?
Thanks for your reply.
I would like the intergenic and regulatory-region variants to be mapped to the nearby gene in the annotation results, instead of having a "-" in the Gene column. For example, when I used ANNOVAR, it displayed the nearby gene and the distance for an intergenic variant. However, in the VEP results, the Gene column shows "-". So I would like to know if there could be an option in VEP that can annotate the closest gene for these variants?
Best, xicao
VEP has the option --distance
to modify the distance up and downstream between a variant and a transcript for which VEP will assign the upstream_gene_variant or downstream_gene_variant consequences. By default, this distance is 5000bp.
To include regulatory information, you can use the option --regulatory
.
You can read more about these two options here: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_distance https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_regulatory
There is also the VEP plugin NearestGene that finds the nearest gene(s). More than one gene may be reported if the genes overlap the variant or if genes are equidistant.
I see. Sincerely thanks for your kind and helpful suggestions!
Best, xicao
发件人:"Diana Lemos" @.> 发送日期:2024-11-05 01:10:03 收件人:"Ensembl/ensembl-vep" @.> 抄送人: 主 题:Re: [Ensembl/ensembl-vep] How to filter the annotated result for each variant? (Issue #1782)
There is also the VEP plugin NearestGene that finds the nearest gene(s). More than one gene may be reported if the genes overlap the variant or if genes are equidistant.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Describe the issue
Hi there, thanks for your work on gene annotation. I annotated my fine-mapping variants using VEP, I recently annotated my fine-mapping variants using VEP, and I have some questions about the results (I input a rsID file):
In the result, I found that rhere are multiple results for most variants, including:
Uploaded_variation Location Allele Gene Feature 18 rs56222534 1:21863905 A ENSG00000162551 ENST00000374840 19 rs56222534 1:21863905 C ENSG00000162551 ENST00000374840
Uploaded_variation Location Allele Gene Feature 13 rs1256328 1:21896767 T CCDS217.1 CCDS217.1 14 rs1256328 1:21896767 T CCDS53274.1 CCDS53274.1 15 rs1256328 1:21896767 T CCDS53275.1 CCDS53275.1 16 rs1256328 1:21896767 T ENSG00000162551 ENST00000374840 17 rs1256328 1:21896767 T 249 NM_001369803.2
Uploaded_variation Location Allele Gene Feature Feature_type SYMBOL 8 rs1256332 1:21893344 A CCDS217.1 CCDS217.1 Transcript - 9 rs1256332 1:21893344 A CCDS53274.1 CCDS53274.1 Transcript - 10 rs1256332 1:21893344 A CCDS53275.1 CCDS53275.1 Transcript - 11 rs1256332 1:21893344 A ENSG00000162551 ENST00000374840 Transcript ALPL 12 rs1256332 1:21893344 A 249 NM_001369803.2 Transcript ALPL
In these cases, how could I select the final one annotation for my input variants?
Thanks, xicao
Additional information
Please fill in the following sections to help us find the source of your issue as quickly as possible.
System
Full VEP command line
Full error message
Data files (if applicable)
They include: