Closed fuesseler closed 2 months ago
There is no option to collapse isoforms into a chimeric transcript. The best way to achieve this is to use bedtools: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Thanks for confirming this isn't currently possible with AGAT! In the end rather than constructing a chimeric transcirpt, I opted to take care of my issue further downstream (by cleaning my alignments from non-homologous exonic falsely aligned regions with HmmCleaner).
Is your feature request related to a problem? Please describe. I have been trying to use agat_sp_extract_sequences.pl to extract all CDS over several transcripts of a single gene, but as far as I can see there is no direct way to achieve this with the current options for this command in AGAT.
Describe the solution you'd like Be able to specify that I want to extract all CDS based on per gene level and not only on per transcript level. Otherwise, if you have a suggestion how to "hack" this problem (or if there are reasons why in general this would be a bad/problematic idea), I would be grateful!
Describe alternatives you've considered I considered extracting the CDS separately (using --split) for each transcript of a gene, concatenating them together, while purging "shared" CDS between transcripts somehow ...
Additional context The reason why I want to do this, is because in the next steps I want to determine orthologues (with OMA) and then generate MSAs. Currently, I am running into the problem that very often OMA groups transcripts of a gene together that have divergent CDS in the beginning or end - which then leads to alignment issues. So, the thought was, if all CDS from all transcripts of a gene are in the input fastas, these misalignments should get resolved.
Grateful if you have any ideas about this!