Open dzc0104 opened 7 months ago
Hey @dzc0104,
Thank you for your question. The problem is related with using the NCBI GTF/GFF annotation for microorganisms: we currently require the GTF/GFF annotation to explicitly describe the transcript and its exons.
For your use case, you could use the following modified annotation:
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ASM478661v1
#!genome-build-accession NCBI_Assembly:GCF_004786615.1
##sequence-region NC_075404.1 1 15186
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2560319
NC_075404.1 RefSeq region 1 15186 . + . ID=NC_075404.1:1..15186;Dbxref=taxon:2560319;country=United Kingdom: N. Ireland;gbkey=Src;genome=genomic;isolate=chicken/N. Ireland/Ulster/67;mol_type=genomic RNA;old-name=Newcastle disease virus
NC_075404.1 RefSeq gene 56 1801 . + . ID=gene-QKC91_gp1;Dbxref=GeneID:80527638;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding;locus_tag=QKC91_gp1
NC_075404.1 RefSeq transcript 122 1591 . + 0 ID=transcript-YP_010790286.1;Parent=gene-QKC91_gp1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1
NC_075404.1 RefSeq exon 122 1591 . + 0 ID=exon-YP_010790286.1;Parent=transcript-YP_010790286.1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1
NC_075404.1 RefSeq gene 1804 3254 . + . ID=gene-QKC91_gp2;Dbxref=GeneID:80527633;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding;locus_tag=QKC91_gp2
NC_075404.1 RefSeq transcript 1887 3074 . + 0 ID=transcript-YP_010790287.1;Parent=gene-QKC91_gp2;Dbxref=GenBank:YP_010790287.1,GeneID:80527633;Name=YP_010790287.1;gbkey=CDS;gene=P;locus_tag=QKC91_gp2;product=phosphoprotein;protein_id=YP_010790287.1
NC_075404.1 RefSeq exon 1887 3074 . + 0 ID=exon-YP_010790287.1;Parent=transcript-YP_010790287.1;Dbxref=GenBank:YP_010790287.1,GeneID:80527633;Name=YP_010790287.1;gbkey=CDS;gene=P;locus_tag=QKC91_gp2;product=phosphoprotein;protein_id=YP_010790287.1
NC_075404.1 RefSeq gene 3256 4496 . + . ID=gene-QKC91_gp3;Dbxref=GeneID:80527634;Name=M;gbkey=Gene;gene=M;gene_biotype=protein_coding;locus_tag=QKC91_gp3
NC_075404.1 RefSeq transcript 3290 4384 . + 0 ID=transcript-YP_010790288.1;Parent=gene-QKC91_gp3;Dbxref=GenBank:YP_010790288.1,GeneID:80527634;Name=YP_010790288.1;gbkey=CDS;gene=M;locus_tag=QKC91_gp3;product=matrix protein;protein_id=YP_010790288.1
NC_075404.1 RefSeq exon 3290 4384 . + 0 ID=exon-YP_010790288.1;Parent=transcript-YP_010790288.1;Dbxref=GenBank:YP_010790288.1,GeneID:80527634;Name=YP_010790288.1;gbkey=CDS;gene=M;locus_tag=QKC91_gp3;product=matrix protein;protein_id=YP_010790288.1
NC_075404.1 RefSeq gene 4498 6289 . + . ID=gene-QKC91_gp4;Dbxref=GeneID:80527635;Name=F;gbkey=Gene;gene=F;gene_biotype=protein_coding;locus_tag=QKC91_gp4
NC_075404.1 RefSeq transcript 4544 6205 . + 0 ID=transcript-YP_010790289.1;Parent=gene-QKC91_gp4;Dbxref=GenBank:YP_010790289.1,GeneID:80527635;Name=YP_010790289.1;gbkey=CDS;gene=F;locus_tag=QKC91_gp4;product=fusion protein;protein_id=YP_010790289.1
NC_075404.1 RefSeq exon 4544 6205 . + 0 ID=exon-YP_010790289.1;Parent=transcript-YP_010790289.1;Dbxref=GenBank:YP_010790289.1,GeneID:80527635;Name=YP_010790289.1;gbkey=CDS;gene=F;locus_tag=QKC91_gp4;product=fusion protein;protein_id=YP_010790289.1
NC_075404.1 RefSeq gene 6321 8322 . + . ID=gene-QKC91_gp5;Dbxref=GeneID:80527636;Name=HN;gbkey=Gene;gene=HN;gene_biotype=protein_coding;locus_tag=QKC91_gp5
NC_075404.1 RefSeq transcript 6412 8262 . + 0 ID=transcript-YP_010790290.1;Parent=gene-QKC91_gp5;Dbxref=GenBank:YP_010790290.1,GeneID:80527636;Name=YP_010790290.1;gbkey=CDS;gene=HN;locus_tag=QKC91_gp5;product=hemagglutinin-neuraminidase;protein_id=YP_010790290.1
NC_075404.1 RefSeq exon 6412 8262 . + 0 ID=exon-YP_010790290.1;Parent=transcript-YP_010790290.1;Dbxref=GenBank:YP_010790290.1,GeneID:80527636;Name=YP_010790290.1;gbkey=CDS;gene=HN;locus_tag=QKC91_gp5;product=hemagglutinin-neuraminidase;protein_id=YP_010790290.1
NC_075404.1 RefSeq gene 8370 15072 . + . ID=gene-QKC91_gp6;Dbxref=GeneID:80527637;Name=L;gbkey=Gene;gene=L;gene_biotype=protein_coding;locus_tag=QKC91_gp6
NC_075404.1 RefSeq transcript 8381 14995 . + 0 ID=transcript-YP_010790291.1;Parent=gene-QKC91_gp6;Dbxref=GenBank:YP_010790291.1,GeneID:80527637;Name=YP_010790291.1;gbkey=CDS;gene=L;locus_tag=QKC91_gp6;product=RNA-dependent RNA polymerase;protein_id=YP_010790291.1
NC_075404.1 RefSeq exon 8381 14995 . + 0 ID=exon-YP_010790291.1;Parent=transcript-YP_010790291.1;Dbxref=GenBank:YP_010790291.1,GeneID:80527637;Name=YP_010790291.1;gbkey=CDS;gene=L;locus_tag=QKC91_gp6;product=RNA-dependent RNA polymerase;protein_id=YP_010790291.1
As this is not the first time we got this question (see https://github.com/Ensembl/ensembl-vep/issues/1074), I am going to talk with the team about the possibility of supporting these NCBI GTF/GFF annotation files for microorganisms. Maybe we can consider each CDS as a single-exon transcript. I will keep you updated on this.
Best regards, Nuno
Thank you for the response @nuno-agostinho It worked for that reference. I have a question did you edit the gff file manually? I have other two references 1) https://www.ncbi.nlm.nih.gov/nuccore/NC_039223.1 2) https://www.ncbi.nlm.nih.gov/nuccore/AF077761 - this one has gff3 files and I tried to convert it into gff and even gtf but could not. Gff3 did not even bgzipped and tabixed.
Hi @dzc0104,
I manually created the file by basically:
transcript
and exon
Tell me if you need further instructions.
this one has gff3 files and I tried to convert it into gff and even gtf but could not. Gff3 did not even bgzipped and tabixed.
If you downloaded the GFF3 annotation via the Send to
form in the top right corner of the record, you need to remove the last empty lines of the file before running bgzip
and tabix
. Tell me if this worked.
Cheers, Nuno
@nuno-agostinho Yay! It worked. Thank you very much, Nuno.
Regard, Deepa
@nuno-agostinho I still have a question. How can position 77 be associated with multiple types of genes, namely F, M, NP, and P? During my analysis, I observed that genomic position 77 is annotated with gene symbols F, M, NP, and P across various transcripts like this Iso7- Vep.xlsx
I got this information from a dataset https://www.ncbi.nlm.nih.gov/nuccore/AF077761 that includes details about gene symbols and transcript types. But I'm not sure what it means biologically to have different gene types at the same position.
Hi @dzc0104,
The only results associated with genes F and M are upstream_gene_variant
or downstram_gene_variant
. Marking variants as upstream/downstream a gene is useful to understand variants that may affect those genes (maybe as regulatory regions).
However, the default distance between a variant and a transcript used by VEP to annotate up/downstream variants is 5 000 bp (optimised for vertebrates) and the genome you mentioned is small (15 186 bp). Please try to decrease the --distance
parameter to make it more sense for your use case.
Hope this makes it clear.
Cheers, Nuno
Hi @nuno-agostinho,
Thank you for your assistance.
As part of my data analysis, I've identified synonymous variants and now I'm exploring their potential impacts at the amino acid level. While synonymous variants traditionally aren't thought to have functional impacts on protein structure, they can affect RNA stability, protein folding, evolutionary conservation, splicing regulation, and regulatory elements.
I've utilized Variant Effect Predictor (VEP) with the SIFT option (-sift b), but unfortunately, I didn't receive any relevant data in the output. Does this lack of prediction indicate that there are no available predictions for my variants?
Here's the command I used: vep -i iso1p1_filtered.snp.vcf.gz \ --gff /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ref/AF077761/sequence.gff3.gz \ --fasta /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ref/AF077761/AF077761.fasta.gz \ --species avian_orthoavulavirus \ --sift b
Additionally, I'm seeking recommendations for other tools to analyze the functional impacts of synonymous variants, particularly those focusing on RNA-level effects, splicing regulation, and non-protein-coding impacts.
Thank you for your guidance! 😊
I have attached hereby the link to the VCF file.
Best regards, Deepa
Hi @dzc0104,
VEP only returns pre-computed SIFT results stored in Ensembl databases in --database
or --cache
modes. However, we don't have SIFT results for avian orthoavulavirus. You may want to consider installing and running SIFT on your data, as per https://sift.bii.a-star.edu.sg.
Regarding additional tools to help predict variant consequences, some articles list such tools:
Hope this information was useful.
Cheers, Nuno
Hi @nuno-agostinho,
I have a similar issue as the one originally reported by @dzc0104 regarding intergenic variant calling.
I've built .gff3 files using both prokka and bakta for reference genomes against which I'm looking to find variants. Here's an excerpt of a bakta .gff3 below:
contig00001 Prodigal CDS 265 723 . + 0 ID=KAHBKG_00010;Name=Transcriptional regulator CtsR;locus_tag=KAHBKG_00010;product=Transcriptional regulator CtsR;Dbxref=COG:COG4463,COG:K,RefSeq:WP_003760062.1,SO:0001217,UniParc:UPI00000CC18E,UniRef:UniRef100_H1GA27,UniRef:UniRef50_A0A143YMT3,UniRef:UniRef90_G2ZA06;gene=ctsR
contig00001 Prodigal CDS 736 1254 . + 0 ID=KAHBKG_00015;Name=Protein-arginine kinase activator protein McsA;locus_tag=KAHBKG_00015;product=Protein-arginine kinase activator protein McsA;Dbxref=COG:COG3880,COG:O,RefSeq:WP_003760064.1,SO:0001217,UniParc:UPI0001EB894E,UniRef:UniRef100_A0A823H5C3,UniRef:UniRef50_H1GA28,UniRef:UniRef90_H1GA28;gene=mcsA
contig00001 Prodigal CDS 1251 2273 . + 0 ID=KAHBKG_00020;Name=protein arginine kinase;locus_tag=KAHBKG_00020;product=protein arginine kinase;Dbxref=COG:COG3869,COG:O,EC:2.7.14.1,GO:0004111,GO:0004672,GO:0005524,GO:0016310,GO:0046314,RefSeq:WP_010990301.1,SO:0001217,UniParc:UPI000013952D,UniRef:UniRef100_Q92F44,UniRef:UniRef50_Q48759,UniRef:UniRef90_Q48759;gene=mcsB
contig00001 Prodigal CDS 2302 4764 . + 0 ID=KAHBKG_00025;Name=endopeptidase Clp ATP-binding chain C;locus_tag=KAHBKG_00025;product=endopeptidase Clp ATP-binding chain C;Dbxref=COG:COG0542,COG:O,RefSeq:WP_003770116.1,SO:0001217,UniParc:UPI00000CC190,UniRef:UniRef100_A0A3H2VSB6,UniRef:UniRef50_A0A0F7N4K2,UniRef:UniRef90_A0A097B1Z0,VFDB:VFC0282,VFDB:VFG000079;gene=clpC
I've tried to make use of your method here:
- Duplicating the CDS lines
- Changing the feature to transcript and exon
- Changing their IDs to something unique
- Changing their Parent IDs:
- Put the gene ID as the parent ID of the transcript
- Put the transcript ID as the parent ID of the exon
and even changing CDS to gene in the .gff3 file and including a biotype to remedy the warning (just on the off chance...):
contig00001 Prodigal gene 265 723 . + . ID=gene-KAHBKG_00010;locus_tag=KAHBKG_00010;gene_biotype=protein_coding
contig00001 Prodigal transcript 265 723 . + . ID=KAHBKG_00010_t1000;Parent=gene-KAHBKG_00010;locus_tag=KAHBKG_00010
contig00001 Prodigal exon 265 723 . + 0 ID=KAHBKG_00010_e1000;Parent=KAHBKG_00010_t1000;locus_tag=KAHBKG_00010
However, I still receive warnings (WARNING: Unable to determine biotype of KAHBKG_01390
) for approx. 30 IDs/locus_tags per .gff3 and variants are still called as intergenic even if the locations fall within a CDS.
Any recommendations here, or if you'd like me to provide test data, do let me know.
Cheers, Joshua
Hi @Joshua-Macleod,
Based on that warning, I would say that those lines have no field indicating their biotype, so VEP can't determine whether they are part of a protein_coding
transcript or not.
Could you show me the lines in your GFF3 file relative to KAHBKG_01390
?
Best, Nuno
Hi @nuno-agostinho,
Thanks for getting back to me.
Here are the lines:
contig00001 Prodigal gene 270089 271192 . + . ID=gene-KAHBKG_01390;locus_tag=KAHBKG_01390;gene_biotype=protein_coding;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN
contig00001 Prodigal transcript 270089 271192 . + . ID=KAHBKG_01390_t1272;Parent=gene-KAHBKG_01390;locus_tag=KAHBKG_01390;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN
contig00001 Prodigal exon 270089 271192 . + 0 ID=KAHBKG_01390_e1272;Parent=KAHBKG_01390_t1272;locus_tag=KAHBKG_01390;Name=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;product=23S rRNA (adenine(2503)-C(2))-methyltransferase RlmN;Dbxref=COG:COG0820,COG:J,EC:2.1.1.192,GO:0000049,GO:0002935,GO:0005737,GO:0008757,GO:0016433,GO:0019843,GO:0031167,GO:0046872,GO:0051539,GO:0070040,GO:0070475,RefSeq:WP_003725208.1,SO:0001217,UniParc:UPI00000CC251,UniRef:UniRef100_Q92EH6,UniRef:UniRef50_Q8Y9P2,UniRef:UniRef90_Q8Y9P2;gene=rlmN
Worth noting, these aren't loci outputted by vep (edit: presumably wouldn't be for the same reason they're noted in the warnings - I didn't put two and two together).
Cheers, Joshua
Hi, I am attempting to annotate a customized VCF file using NCBI's GFF and (fna) FASTA files for the Newcastle disease virus (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_004786615.1/). However, I've observed that all the variants are being classified as intergenic. But this is not true, when viewed in IGV.
System
Script
To install the bgzip and tabix (I did it in my local terminal)
Download htslib-1.19.1.tar.gz
tar -zxvf htslib-1.19.1.tar.gz cd htslib-1.19.1
removing header line of gff as vep does not work with files having header line (local terminal)
grep -v '^#' genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip > genomic.gff.gz tabix -p gff genomic.gff.gz
for compressing fasta file (local terminal and transfer all the files in super computer later)
bgzip -c GCF_004786615.1_ASM478661v1_genomic.fna > GCF_004786615.1_ASM478661v1_genomic.fna.gz
for indexing fasta file
samtools faidx GCF_004786615.1_ASM478661v1_genomic.fna.gz
creating a synonyms file that maps the chromosome names used in your VCF to those used in your GFF file
zcat iso1_filtered.snp.vcf.gz | grep -v '^#' | sort -k1,1 -o sorted_iso1.vcf cut -f1 sorted_iso10.vcf > 1snpsynonyms.txt zcat genomic.gff.gz | grep -v '^#' | sort -k1,1 -o sorted.gff
variants annotation for snp using ASM4786615.1
vep -i iso1_filtered.snp.vcf.gz --gff /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/genomic.gff.gz --fasta /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/GCF_004786615.1_ASM478661v1_genomic.fna.gz --synonyms 1snpsynonyms.txt --species avian_orthoavulavirus
Full error message
I have not got any warning message as the script ran but the output file was with all intergenic variants.
Data files
A sample of the GFF after NC_075404.1 RefSeq region 1 15186 . + . ID=NC_075404.1:1..15186;Dbxref=taxon:2560319;country=United Kingdom: N. Ireland;gbkey=Src;genome=genomic;isolate=chicken/N. Ireland/Ulster/67;mol_type=genomic RNA;old-name=Newcastle disease virus NC_075404.1 RefSeq gene 56 1801 . + . ID=gene-QKC91_gp1;Dbxref=GeneID:80527638;Name=N;gbkey=Gene;gene=N;gene_biotype=protein_coding;locus_tag=QKC91_gp1 NC_075404.1 RefSeq CDS 122 1591 . + 0 ID=cds-YP_010790286.1;Parent=gene-QKC91_gp1;Dbxref=GenBank:YP_010790286.1,GeneID:80527638;Name=YP_010790286.1;gbkey=CDS;gene=N;locus_tag=QKC91_gp1;product=nucleoprotein;protein_id=YP_010790286.1 NC_075404.1 RefSeq gene 1804 3254 . + . ID=gene-QKC91_gp2;Dbxref=GeneID:80527633;Name=P;gbkey=Gene;gene=P;gene_biotype=protein_coding;locus_tag=QKC91_gp2 .....
A sample of the compressed VCF
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT iso1
NODE_1_length_6008_cov_909.877255 980 . T C 12078.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.924;DP=624;ExcessHet=0.0000;FS=1.120;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=19.87;ReadPosRankSum=0.149;SOR=0.728 GT:AD:DP:GQ:PL 0/1:236,372:608:99:12086,0,6929 NODE_1_length_6008_cov_909.877255 3666 . C T 15573.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-0.079;DP=770;ExcessHet=0.0000;FS=7.765;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=20.88;ReadPosRankSum=0.795;SOR=0.362 GT:AD:DP:GQ:PL 0/1:235,511:746:99:15581,0,5829 NODE_1_length_6008_cov_909.877255 3812 . A G 534.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=1.096;DP=826;ExcessHet=0.0000;FS=15.515;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=0.66;ReadPosRankSum=-12.298;SOR=2.487 GT:AD:DP:GQ:PL 0/1:722,85:807:99:542,0,23105 NODE_1_length_6008_cov_909.877255 4631 . T C 1817.64 ReadPosRankSum-8 AC=1;AF=0.500;AN=2;BaseQRankSum=-3.725;DP=846;ExcessHet=0.0000;FS=22.208;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=2.24;ReadPosRankSum=-13.945;SOR=1.685 GT:AD:DP:GQ:PL 0/1:680,133:813:99:1825,0,21905 NODE_2_length_2668_cov_848.858356 289 . G A 924.64 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.811;DP=720;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.97;MQRankSum=0.000;QD=1.50;ReadPosRankSum=-5.861;SOR=0.631 GT:AD:DP:GQ:PL 0/1:531,87:618:99:932,0,16256 .....
Synonyms text file format NODE_1_length_6008_cov_909.877255 NC_075404.1 NODE_1_length_6008_cov_909.877255 NC_075404.1 NODE_1_length_6008_cov_909.877255 NC_075404.1 NODE_1_length_6008_cov_909.877255 NC_075404.1 NODE_2_length_2668_cov_848.858356 NC_075404.1 NODE_2_length_2668_cov_848.858356 NC_075404.1 .....
VEP output
ENSEMBL VARIANT EFFECT PREDICTOR v104.3
Output produced at 2024-02-09 19:23:53
Using API version 104, DB version ?
ensembl-funcgen version 104.f1c7762
ensembl-io version 104.1d3bb6e
ensembl version 104.1af1dce
ensembl-variation version 104.20f5335
Column descriptions:
Uploaded_variation : Identifier of uploaded variant
Location : Location of variant in standard coordinate format (chr:start or chr:start-end)
Allele : The variant allele used to calculate the consequence
Gene : Stable ID of affected gene
Feature : Stable ID of feature
Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature
Consequence : Consequence type
cDNA_position : Relative position of base pair in cDNA sequence
CDS_position : Relative position of base pair in coding sequence
Protein_position : Relative position of amino acid in protein
Amino_acids : Reference and variant amino acids
Codons : Reference and variant codon sequence
Existing_variation : Identifier(s) of co-located known variants
Extra column keys:
IMPACT : Subjective impact classification of consequence type
DISTANCE : Shortest distance from variant to transcript
STRAND : Strand of the feature (1/-1)
FLAGS : Transcript quality flags
SOURCE : Source of transcript
genomic.gff.gz : /home/shared/hauck_research/Deepa_NDV_updated/troubleshooting/ncbiASM478661/ncbi_dataset/data/GCF_004786615.1/genomic.gff.gz (overlap)
Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra
NODE_1_length_6008_cov_909.877255_980_T/C NODE_1_length_6008_cov_909.877255:980 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER NODE_1_length_6008_cov_909.877255_3666_C/T NODE_1_length_6008_cov_909.877255:3666 T - - - intergenic_variant - - - - - - IMPACT=MODIFIER NODE_1_length_6008_cov_909.877255_3812_A/G NODE_1_length_6008_cov_909.877255:3812 G - - - intergenic_variant - - - - - - IMPACT=MODIFIER NODE_1_length_6008_cov_909.877255_4631_T/C NODE_1_length_6008_cov_909.877255:4631 C - - - intergenic_variant - - - - - - IMPACT=MODIFIER ....