Closed JC-therea closed 1 month ago
What is classified RNA are records that do not contain any CDS. AGAT cannot tell if it is ncRNA, tRNA,miscRNA,etc. The fact that it stops early and does not output everything is problematic. I had this bug in an earlier version where I forgot to remove a debug line. But it was supposed to be fixed in version 1.4. Could you check with the 1.4.1? Can you check that you are really using it he version 1.4?
Thanks for your quick answer Jacques,
Thank you for the explanation of the RNA feature.
The version that I used in conda:
$ agat --version
v1.4.0
After updating to version 1.4.1 here is the output of the same command that I post before:
##gtf-version X
# GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.
chloroplast AGAT gene 4 76 . - . gene_id "ATCG00010"; ID "ATCG00010";
chloroplast AGAT RNA 4 76 . - . gene_id "ATCG00010"; transcript_id "ATCG00010.1"; ID "ATCG00010.1"; Parent "ATCG00010";
chloroplast Araport11 exon 4 76 73 - . gene_id "ATCG00010"; transcript_id "ATCG00010.1"; ID "agat-exon-322058"; Parent "ATCG00010.1";
chloroplast AGAT gene 383 1444 . - . gene_id "ATCG00020"; ID "ATCG00020";
chloroplast AGAT mRNA 383 1444 . - . gene_id "ATCG00020"; transcript_id "ATCG00020.1"; ID "ATCG00020.1"; Parent "ATCG00020";
chloroplast Araport11 exon 383 1444 . - . gene_id "ATCG00020"; transcript_id "ATCG00020.1"; ID "agat-exon-322059"; Parent "ATCG00020.1";
chloroplast Araport11 CDS 383 1444 . - 0 gene_id "ATCG00020"; transcript_id "ATCG00020.1"; ID "agat-cds-286112"; Parent "ATCG00020.1";
However this time I saw many warnings:
Warning: at3g19820.3 stop codon not adjacent to the CDS
Warning: at3g19830.1 stop codon not adjacent to the CDS
Warning: at3g19830.2 stop codon not adjacent to the CDS
Maybe this is the problem... Do you think that I should remove features that are not CDS or exon to avoid those warnings and maybe fix the file? Here are all the features:
$ cut -f 3 'Araport11_GFF3_genes_transposons.201606.corrected.gtf' | sort | uniq -c
52672 3UTR
60686 5UTR
286355 CDS
322385 exon
48095 start_codon
48106 stop_codon
The warning should not stop the process. I will have to investigate the problem. Sorry
Don't worry, I'm not in a hurry! Your tool has helped me immensely on countless occasions.
Thank you so much for your work and for AGAT.
Bests
Dear Jacques,
After reviewing the input and output of AGAT in more detail, I realized the error was mine when visualizing the created file. I apologize for any time I may have taken from you with this issue.
Bests
Great! Thank you for your feedback.
Many lines are missing and some canonical genes now are classified as "RNA".
I wanted to fix the gtf file that I received from a companion. This is the file.
The file miss some important features like gene and transcript (or mRNA) so I used the tool agat_convert_sp_gff2gtf.pl to keep it as a gtf file.
agat_convert_sp_gff2gtf.pl --gtf 'Araport11_GFF3_genes_transposons.201606.corrected.gtf' -o atha_v2.gtf
I expected to receive a very similar file but with the features gene and mRNA.
To compare the input and the output here are the before and after: Original file:
After agat:
As you can see, many lines related to CDSs and exons are missing. Given the original GTF, is this output expected? Do you think I should use another tool before
agat_convert_sp_gff2gtf.pl
?