NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
465 stars 56 forks source link

extract fasta for the longest possible transcript for each gene model #356

Closed splaisan closed 1 year ago

splaisan commented 1 year ago

Thanks for the great tool I would like to extract the fasta sequence of the longest artificial concatenated gene model resulting from merging all exons and resolving overlapping exons by merging them too.

The best I could get now with agat_sp_extract_sequences.pl // -t exon --merge are full transcripts but I get several transcripts for the same gene while I want only one (even if it does not code due to frameshifts.

Can you please help me with a strategy to achieve my goal Thanks

Juke34 commented 1 year ago

You can make a GTF file (with AGAT) in order that exons of different isoforms get the same gene_id, then using a sed and a awk command you remove all features excepted exons as well as all ID\Parent attributes. Finally you can use the agat_sp_extract_sequences.pl command you mentioned.