Open eprdz opened 10 months ago
Hi @eprdz ,
Thank you for reaching out. The logic for fetching biotype makes sense to be consistent across GFF and GTF file. We are going to work on this and keep you updated.
Kind regards, Likhitha
Thank you for answering me. I was wondering if you know which of the 2 is the correct approach or if there is a third option. Can you point me in the right direction (documentation, a colleague) to find out more about this so I can get a timely solution for this? By now, what I do is execute AGAT to transform GFF files to GTF v.2.5 files in order to overcome the need of having the biotype in the 9th column. Do you think this is feasible? Thanks in advance for your help.
Hi @eprdz , Thank you for your patience.
In GFF files, biotype is fetched from "biotype", "transcript_type", "transcript_biotype" of the attribute fields, there are also some additional checks for biotype. In the future update, if the biotype is still empty in spite of all the checks, then it will be fetched from the source column (similar to GTF). Converting to GTF file should be feasible in the meantime.
Kind regards, Likhitha
In order to use VEP, I made an AUGUSTUS GFF annotation file. Here you can see the GFF structure for the first gene:
Nevertheless, all variants from my VCF file appeared as non-coding, although I know that some of them are coding variants.
I read in the GTF and GFF format expectations that if the annotation file has GFF format, it is necessary that in the 9th column appears a biotype parameter (indeed, the AUGUSTUS GFF file did not have this parameter in the 9th column) Nevertheless, for the GTF file, if the biotype parameter does not appear in the 9th column, the biotype will be inferred from the source (2nd column).
In checked your code and indeed you do this. This code is from ensembl‑vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File/GTF.pm:
And this is from ensembl‑vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File/GFF.pm:
Can the biotype be inferred from the second column in the GFF file too? If not, why? Thank you in advance.