Closed dongdongdong0203 closed 8 months ago
The awk command does not make any distinction of what type of "gene", while AGAT does... what you present is the genes that have a transcript and a cds. I guess you have another table with the results of gene that have a transcript but no CDS (only exon as for non coding gene) i.e. gene@trancript@exon, you may also have a table with gene that have tRNA and exons i.e. gene@trna@exon etc.... gene@pseudogne@exon ... So you have to make a total of all of these results.
Thank you for your prompt response and comments.
As you correctly guessed, my results included 'gene@transcript@cds' and 'gene@transcript@exon'. The total number of genes for both results is consistent with the number of genes in the GTF files.
However, the predicted features of the GTF files do not include CDS (as shown below). Could you please explain how 'gene@transcript@cds' enables comparison in this case? Based on my predicted GTF file, there are no CDS present. The aim is to compare the transcript level differences. If necessary, please advise on how to modify the config file. Thank you. As a beginner, I would appreciate your assistance.
Thanks.
Did you check with awk or agat_sq_stat_basic.pl
that you do not have any CDS?
You necessarily have CDS in one of the files
Dear @Juke34
The reference GTF used in the analysis contains CDS, whereas the predicted GTF file does not.
Additionally, two GTF files without CDS were tested, resulting in a comparison of only gene@transcript@exon
, which is the desired outcome from agat_sp_compare_two_annotations.pl
. Thanks for your patience!
Best RUAN
Describe the bug I used the agat_sp_compare_two_annotations.pl script to compare the reference GTF with the predicted GTF of the full-length transcriptome with a view to obtaining differences in the predicted transcripts or genes.
General (please complete the following information):
To Reproduce
agat_sp_compare_two_annotations.pl -gff1 $Refgtf -gff2 ${inpath}/OUT.extended_annotation.gtf -o ${inpath}/${sample[$SLURM_ARRAY_TASK_ID]}
However, it turns out that the number of genes in the results file doesn't match what I counted with
awk
awk '$3=="gene"' $Refgtf | wc -l
35670awk '$3=="gene"' OUT.extended_annotation.gtf | wc -l
33226`Does this result make sense, or is there a problem with my command.
Thanks