Closed AndreaG5 closed 3 years ago
Hi @AndreaG5,
dbSNP identifiers are included in the cache. Can you try running vep with option --check_existing
?
Could you please share your ouptut header fields?
Hi @dglemos, thank you so much! using --check_existing solved my issue about dbSNP identifiers!
Here Header fields (trying to solve second issue i.e. annotate with --field option):
Uploaded_variation | Location | Allele | Gene | feature | Feature_type | Consequence | cDNA_position | CDS_position | Protein_position | Amino_acids | Codons | Existing_variation | Extra (IMPACT,STRAND,SYMBOL,SYMBOL_SOURCE,HGNC_ID,HGVSC,HGVSP,CLIN_SIG,EXAC_FREQ,clinvar,clinvar_CLNREVSTAT)
The option --check_existing
adds extra fields to the output. I can't see fields SOMATIC and PHENO on your header. Is this the header after you run vep with --check_existing
?
Related to the missing fields in the output, you could try using --fields
with just a few headers and check if the output includes those, for example start with:
--fields "Uploaded_variation,Location,Allele"
My bad, SOMATIC and PHENO were present!
When I tried using --fields option it performs this "reorder" but it's unable to annotate informations. So every column in INFO (or Extra) is split and order according to my list but they're filled by "-". So for example no frequencies are available, no gene symbol etc.
example:
Here the result omitting --field option:
Here the result with --field option
(I am sorry for the snapshot but it wasn't wasy to fit all column in a single pic)
Thanks
Thanks for the images, it's easier to understand what's going on.
--fields
only works with tab or VCF format output. In your vep command line you are using --tab
but your output is not tab format, it seems to be the VEP default output. Can you run vep and make sure you are using --tab
?
Yes, I am sorry I put different outputs without specifing anything. I tried both using --vcf or --tab. I used --fields along with --tab (the second snapshot is referred to that). The output in both cases is the same and is the second pic
You don't have Exac_AF (...) in your output header. Your Extra columns are only IMPACT,STRAND,SYMBOL,SYMBOL_SOURCE,HGNC_ID,HGVSC,HGVSP,CLIN_SIG,EXAC_FREQ,clinvar,clinvar_CLNREVSTAT,SOMATIC,PHENO
. Only these columns are going to have an annotation in your output file.
YES I KNOW. It's just because the "Extra" field is very long (ExAC informations are present). To have an idea you can just look at SYMBOL. While it is present and correctly annotate in the first snapshot, it is missing in the second.
I didn't notice the symbol was missing, sorry about that.
In --fields
you have to use exactly the same header name as in the VEP output. In VEP output is SYMBOL
(capital letters), however in your fields it's Symbol
. The same applies to the other headers.
Oh I am sorry, I was pretty sure I used the correct header. Thank you so much for your quick and great response!
p.s. Can I go little off-topic asking you if is there a way to directly (from launch command) split Location field into Chromosome and Position (two different column).
Again, thank you so much!
I'm glad the issue is sorted out. Unfortunately, there is no option to split the column but if you use VCF format output the chromosome and position are in two different columns.
Ok thank you so much, Have a good day!
Describe the issue
Hello everyone, I am using VEP docker. I am able to annotate my vcf, but it fails to annotate rs identifiers (I understood they are within cache). I manually downloaded cache and some plugins and custom annotation. It seems to me that dbSNP was included in cache since I dind't find it either in plugins and custom db. I want to know if I am missing something or if there's an error.
Second issue is related to --fields option. VEP is able to annotate vcf correctly when I don't specify any --fields option. When I try to "reorder" vcf columns according to my list it fails to annotate every field within INFO column.
Additional information
No Errors, No Warnings when running.
System
Full VEP command line
Thank you!