Closed sivasubramanics closed 1 week ago
Right your file was badly designed.
Indeed you have the three_prime_UTR
and the five_prime_UTR
attached directly to the gene
instead to be attached to the mRNA
.
AGAT try to follow the information provided. Then it create a new RNA to attach the UTR because it is not allowed to link them to the gene. But it has only one gene feature while it needs 2 and create a new one. Then I guess as it does not know to which gene it has to attach the original information, it get rid of it.
If it is consistent among the file you con fix that by removing the Parent
attribute of all three_prime_UTR
and the five_prime_UTR
. Then when parsed by AGAT , it will attach those features to the previous mRNA encountered (sequentialy) when parsing the file.
You can even get rid of all three_prime_UTR
and the five_prime_UTR
features. AGAT will re-create them based on the CDS and exon coordinates.
Yeah. Now I understand. The Parent field in the UTRs is causing this issue. Thanks. Maybe you can add it in the Docs as well.
Describe the bug I am trying to perform the conversion of GFF to GTF, of one of the ncbi downloaded reference GFF file. I am finding the converted output GTF introduced additional features for the transcripts. Also, the attribute ID is changed.
General (please complete the following information):
To Reproduce
Input file description (downloaded gff for this genome: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_949927565.1/)
and the output we received
ISSUE I feel the input GFF contains all the necessary features to convert only those lines to GTF, but we observe the additional IDs and RNA features introduced by agat. This makes it difficult to process and misleading.
In short,
Otherwise, am I missing something in this process?