NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit https://nbisweden.github.io/AGAT/
GNU General Public License v3.0
468 stars 56 forks source link

Extraction of all transcripts from the AUGUSTUS output *.gff dat. Confused with the appropriate AGAT command. #498

Closed Vijithkumar2020 closed 2 months ago

Vijithkumar2020 commented 2 months ago

I have completed AUGUSTUS de novo gene prediction, and I want to perform the homology-based gene annotation using BLASTX. The AUGUSTUS has output a *.gff file as follows:

# ----- prediction on sequence number 1 (length = 1077, name = SVA1_S1_L008_001_contig_1) -----
#
# Predicted genes for sequence number 1 on both strands
# start gene g1
SVA1_S1_L008_001_contig_1   AUGUSTUS    gene    1   1077    0.69    -   .   g1
SVA1_S1_L008_001_contig_1   AUGUSTUS    transcript  1   1077    0.69    -   .   g1.t1
SVA1_S1_L008_001_contig_1   AUGUSTUS    tts 1   1   .   -   .   transcript_id "g1.t1"; gene_id "g1";
SVA1_S1_L008_001_contig_1   AUGUSTUS    exon    1   714 .   -   .   transcript_id "g1.t1"; gene_id "g1";
SVA1_S1_L008_001_contig_1   AUGUSTUS    stop_codon  387 389 .   -   0   transcript_id "g1.t1"; gene_id "g1";
SVA1_S1_L008_001_contig_1   AUGUSTUS    intron  715 1077    0.79    -   .   transcript_id "g1.t1"; gene_id "g1";
SVA1_S1_L008_001_contig_1   AUGUSTUS    CDS 387 714 0.81    -   1   transcript_id "g1.t1"; gene_id "g1"; 

Now, I am confused if agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta -t gene would be better to extract the transcripts.

Juke34 commented 2 months ago

Since blastx translates the query sequence in all six reading frames to blast against a protein database, the logic would push you to extract only what is supposed to be translated i.e: CDS.

Vijithkumar2020 commented 2 months ago

Thank you so much for the timely response. This means I am supposed to use

agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta -t cds

Regards

On Thu, Oct 3, 2024 at 5:19 PM Jacques Dainat @.***> wrote:

Since blastx translates the query sequence in all six reading frames to blast against a protein database, the logic would push you to extract only what is supposed to be translated i.e: CDS.

— Reply to this email directly, view it on GitHub https://github.com/NBISweden/AGAT/issues/498#issuecomment-2391219888, or unsubscribe https://github.com/notifications/unsubscribe-auth/APVJDN3MBZ7N2EIZ3UAWGULZZUVLDAVCNFSM6AAAAABPJE2QROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJRGIYTSOBYHA . You are receiving this because you authored the thread.Message ID: @.***>