genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

GAG discards UTR features #149

Open marchoeppner opened 9 years ago

marchoeppner commented 9 years ago

I am not sure this behaviour is by design, but GAG currently ejects UTR features into the file genome.ignored.gff , resulting in the final annotation genome.gff to have no UTR annotations either. This concerns features with the feature type:

five_prime_UTR three_prime_UTR

However, these two features are perfectly valid within the GFF3 standar and probably shouldn't be ignored (?).

More information at: http://www.sequenceontology.org/gff3.shtml

bruab commented 9 years ago

@marchoeppner This is by design, or by laziness perhaps. Since UTR isn't included in the NCBI's .tbl file, we choose to ignore them. Including them in the output is non-trivial, since certain filters and fixes within GAG can shift the boundaries between CDS and UTR. It's doable, it's just more complicated than Read-Them-In-And-Write-Them-Out.

As long as this omission doesn't cause anybody trouble with their genome submission, fixing it is low-priority. If anyone gets errors or other flak due to the absence of UTR, we'll move it up the queue.

marchoeppner commented 9 years ago

Understood - maybe something for the future? We are using Gag and Annie for things other than NCBI tbl dumping, so not being able to parse all features makes things a little tricky. That being said, it is already a very useful tool as is!

bruab commented 9 years ago

We have decided to do this. Maybe next week, depending on how horribly some transcriptome submissions go ...

bruab commented 9 years ago

We will create new UTR features from scratch, rather than preserve the original ones. This is simpler, gets around the issue of fixes and filters shifting UTR boundaries.

PaTapiaBioinfo commented 5 years ago

I am not sure this behaviour is by design, but GAG currently ejects UTR features into the file genome.ignored.gff , resulting in the final annotation genome.gff to have no UTR annotations either. This concerns features with the feature type:

five_prime_UTR three_prime_UTR

However, these two features are perfectly valid within the GFF3 standar and probably shouldn't be ignored (?).

More information at: http://www.sequenceontology.org/gff3.shtml

i solved it replacing the line 246 in src/gff_reader.py:

        elif ltype == 'start_codon' or ltype == 'stop_codon' or ltype == 'five_prime_UTR' or ltype == 'three_prime_UTR'

this mantain the UTR in the .gff output but i thinks that not valid for NCBI tbl