Closed etapanari closed 4 years ago
Hi,
VEP does support the use of GFF files for custom annotations with the —custom flag. You can find more information about this here: https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
If you’re still having trouble, if you could send me a copy of your GFF file and your input then I can take a closer look.
Kind Regards, Andrew
On 31 Mar 2020, at 17:53, Electra notifications@github.com wrote:
Hi there,
I have ran VEP for covid-19 using
a gff file genome reference file variation file of ensembl format and I get back a very poor variation annotation.
Is is something that I am doing wrong? Is it maybe that VEP doesn't work properly for viral genomes?
A colleague tried another variation annotation tool snpEff and it gave her a much richer annotation for the variations.
I have the feeling that when using VEP, it ignores the GFF file I provide because I don't even see transcipt or gene annotations in the results.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Ensembl/ensembl-vep/issues/725, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH56GN4DHNGWKM3NLQICXGLRKINZPANCNFSM4LXX2A3Q.
Thanks Andrew, I run VEP like this:
vep -i covid_19_variation_vep.txt -gff covid_19.gff.gz --fasta NC_045512v2.fa.masked.gz --verbose --species covid-19
Do I need to add the flag --custom ?
Hi Electra,
Your current input looks good. The —gff flag uses the —custom functionality, so your input command looks fine.
I’m happy to take a closer look if you’re able to send me a sample of your input files which I can use to reproduce the issue.
Kind Regards, Andrew
On 31 Mar 2020, at 18:02, Electra notifications@github.com wrote:
Thanks Andrew, I run VEP like this:
vep -i covid_19_variation_vep.txt -gff covid_19.gff.gz --fasta NC_045512v2.fa.masked.gz --verbose --species covid-19
Do I need to add the flag --custom ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ensembl/ensembl-vep/issues/725#issuecomment-606752834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH56GN2WLF5RXU65CEDC7KDRKIO2FANCNFSM4LXX2A3Q.
Hi Andrew,
Thanks so much for the prompt reply. This is a head of the gff file which I have ordered,compressed and tabixed:
zcat covid_19.gff.gz | head NC_045512v2 RefSeq five_prime_UTR 1 265 . + . ID=id-NC_045512v2:1..265;gbkey=5'UTR NC_045512v2 RefSeq region 1 29903 . + . ID=NC_045512v2:1..29903;Dbxref=taxon:2697049;collection-date=Dec-2019;country=China;gbkey=Src;genome=genomic;isolate=Wuhan-Hu-1;mol_type=genomic RNA;nat-host=Homo sapiens NC_045512v2 RefSeq CDS 266 13468 . + 0 ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=Genbank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=orf1ab;locus_tag=GU280_gp01;product=orf1ab polyprotein;protein_id=YP_009724389.1 NC_045512v2 RefSeq CDS 266 13483 . + 0 ID=cds-YP_009725295.1;Parent=gene-GU280_gp01;Dbxref=Genbank:YP_009725295.1,GeneID:43740578;Name=YP_009725295.1;Note=pp1a;gbkey=CDS;gene=orf1ab;locus_tag=GU280_gp01;product=orf1a polyprotein;protein_id=YP_009725295.1 NC_045512v2 RefSeq gene 266 21555 . + . ID=gene-GU280_gp01;Dbxref=GeneID:43740578;Name=orf1ab;gbkey=Gene;gene=orf1ab;gene_biotype=protein_coding;locus_tag=GU280_gp01 NC_045512v2 RefSeq CDS 13468 21555 . + 0 ID=cds-YP_009724389.1;Parent=gene-GU280_gp01;Dbxref=Genbank:YP_009724389.1,GeneID:43740578;Name=YP_009724389.1;Note=pp1ab%3B translated by -1 ribosomal frameshift;exception=ribosomal slippage;gbkey=CDS;gene=orf1ab;locus_tag=GU280_gp01;product=orf1ab polyprotein;protein_id=YP_009724389.1 NC_045512v2 RefSeq CDS 21563 25384 . + 0 ID=cds-YP_009724390.1;Parent=gene-GU280_gp02;Dbxref=Genbank:YP_009724390.1,GeneID:43740568;Name=YP_009724390.1;Note=structural protein%3B spike protein;gbkey=CDS;gene=S;locus_tag=GU280_gp02;product=surface glycoprotein;protein_id=YP_009724390.1 NC_045512v2 RefSeq gene 21563 25384 . + . ID=gene-GU280_gp02;Dbxref=GeneID:43740568;Name=S;gbkey=Gene;gene=S;gene_biotype=protein_coding;locus_tag=GU280_gp02 NC_045512v2 RefSeq CDS 25393 26220 . + 0 ID=cds-YP_009724391.1;Parent=gene-GU280_gp03;Dbxref=Genbank:YP_009724391.1,GeneID:43740569;Name=YP_009724391.1;gbkey=CDS;gene=ORF3a;locus_tag=GU280_gp03;product=ORF3a protein;protein_id=YP_009724391.1 NC_045512v2 RefSeq gene 25393 26220 . + . ID=gene-GU280_gp03;Dbxref=GeneID:43740569;Name=ORF3a;gbkey=Gene;gene=ORF3a;gene_biotype=protein_coding;locus_tag=GU280_gp03
Hi,
If it would be possible, could you please send the files to helpdesk@ensembl.org and I can pick them up from there?
Thanks, Andrew
sure!
Hi,
Apologies for the delay in getting back to you. I've taken a look at these differences this morning, and it seems as if the issue is with the GFF file - VEP is expecting lines of type 'transcript' and 'exon' to allow it to construct the transcript model required to annotate your variants.
You can see an example of the gene, transcript, exon and CDS model format that VEP expects within GFF files here: https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gfftypes
If you have any further questions, please let us know.
Kind Regards, Andrew
Hi Andrew,
Thanks a lot for your help! I have edited the GFF to include transcript and exon lines and now it works!
Best regards, Electra
Hello @etapanari,
can you share this updated annotation with "transcipt" and "exon" lines?
Thanks in advance and best regards, Stefan
Hi @stefanches7,
Just incase you're interested, we now have an Ensembl COVID-19 site where you can find a gff file that is supported in VEP - https://covid-19.ensembl.org/info/data/ftp/index.html
Kind Regards, Andrew
Thanks @aparton, this is quite useful! This file was at first hard to find, because I've tried to "Export data via the website" but found no "Export data" button. https://covid-19.ensembl.org/downloads.html also seems to point only to the help page, but not the actual data.
The other point are the subproteins that are there in pp1ab polyprotein. In my group, for instance, we are interested for variant consequences on protein level, so maybe it would be useful if annotation also contained the respective fields for non-structural proteins? I've done it now manually just by fetching UCSC uniProtCov table and converting the information to GFF3 format.
Best regards, Stefan
Hi @stefanches7,
Thank you for your feedback, I've passed it on to the appropriate people.
I'm going to close this ticket now. If you have any further questions, please feel free to reopen it or open a new one.
Kind Regards, Andrew
Hi there,
I have ran VEP for covid-19 using 1) a gff file 2) genome reference file 3) variation file of ensembl format
and I get back a very poor variation annotation.
Is it something that I am doing wrong? Is it maybe that VEP doesn't work properly for viral genomes?
A colleague tried another variation annotation tool snpEff and it gave her a much richer annotation for the variations.
I have the feeling that when using VEP, it ignores the GFF file I provide because I don't even see transcipt or gene annotations in the results.