NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
466 stars 56 forks source link

agat_sp_flag_premature_stop_codons.pl mostly flags genes without premature stop codons #504

Open JWDebler opened 1 month ago

JWDebler commented 1 month ago

Describe the bug I lifted annotations from a reference genome to another isolate with LiftOff. I wanted to check how the new annotations are. Do they have start and stop codons or in-frame/premature stop codons.

I ran agat_sp_flag_premature_stop_codons.pl --gff ArII0004_liftoff_polished_genes.gff3 --fasta ArII0004_polished.fasta --out AGAT

General (please complete the following information):

To Reproduce annotation lifted over with LiftOff command used: liftoff -g me14.gff -o ArII0004_liftoff.gff3 -exclude_partial -p 16 -copies -polish ArII0004_polished.fasta me14.fasta

Expected behavior Flag annotations with premature stop codons --> yet most of the flagged annotations are fine when inspected in Geneious

Attached Files AGAT_report.txt genome-and-annotations.zip

Happy to supply more info or data if needed. Cheers

JWDebler commented 1 month ago

Running agat_sp_fix_cds_phases.pl over the input files reduced detected mRNAs with problems from 10946 to 8902, but just spot checking some of the mentioned ones I find mostly correctly annotated genes.

AGAT_phased_report.txt

Juke34 commented 1 month ago

To check the presence of start and stop you can extract the protein with AGAT and then use ‘ gaas_fasta_checkProteins.pl’ from GAAS

But I’m not sure to get what is your issue. Could you extract and show a records you think is different between agat and geneious?

JWDebler commented 1 month ago

Let me try to explain this better with screenshots.

The report agat produced (attached above) flags gene rna-gnl|eko05|eko05_0007601-t1 as containing 1 premature stop codon, looking at that gene in Geneious confirms that. image

The report for gene rna-gnl|eko05|eko05_0000025-t1 claims it contains 13 premature stops, but looking at this gene in Geneious shows you that the CDS is just fine. image

Juke34 commented 1 month ago

Might be related to CDS phase. Is is taken into account by geneious? Try to extract the record from the gff. Set the phase manually to 0 and run agat on that record, see if premature stop codons are still detected.