NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
459 stars 56 forks source link

gff3 file lacks CDS type #249

Closed sjfleck closed 2 years ago

sjfleck commented 2 years ago

Hello, I'm comparing another publications' assembly and annotation to my own. They provide 4 files:

  1. assembly.fasta
  2. annotation.gff3
  3. CDS.fasta
  4. Protein.fasta

The strange thing about the gff3 file is that it lacks a CDS type. It only has gene, mRNA, intron, and exon. I'm also attempting to upload this assembly and annotation to CoGe (https://genomevolution.org/coge/) and I believe it requires a CDS for the annotation to be useful. I tried uploading it today, but CoGe is treating the genome as if it's not annotated.

Does AGAT have a tool to add CDS into my annotation? Any help would be appreciated. Thanks

Juke34 commented 2 years ago

You might change the exon type by CDS with a sed command and then run agat_sp_fix_longest_ORF.pl to check the CDS to get the longest. But AGAT is not a predictors tool. You will probably get a better job with something like transdecoder.

sjfleck commented 2 years ago

Thanks for the recommendation. I dug in more and this is an annotation that went through one round of gFACs filtering. gFACs outputs an Ensembl v3 gff3 that only contains information on mRNA, exon, and intron types (it doesn't actually have the gene type like I said above). I could try assuming that the CDS = exon, extract proteins, and see if my protein file matches theirs.