Closed emmafg closed 2 months ago
When you have the gtf file with your favorite isoforms, you can use the script getAnnoFastaFromJoingenes.py (part of Augustus) to generate the protein and codingseq file. Or you create a list of your favorite transcript names and use cdbfasta/cdbyank to extfact the sequences.
Furthermore, do these t2, t3 etc correspond to false positives? Because I noticed that very often they are created because Braker recognizes a start codon within the gene without a stop codon beforehand?
These are possible alternative transcripts. All of them had evidence. Experts of one or the other particular gene may be able to judge whether one or the other isoform of such a gene is a false positive. We usually don't call them false positives.
Hello Katharina,
Thank you for your valued assistance.
Emma
Hello,
thank you a lot for your help.
I have a question regarding the isoforms obtained by Braker3. Indeed in the output files I obtain many t2,t3,t4... I would like to know if there was an option that would tell Braker to only keep the longest one?
I tried with: ./bin/get_longest_isoform.py --gtf braker.gtf --out longest_insoforms.gtf
But this only modifies the braker.gtf file and not braker.codingseq and braker.aa. I would like to have only the t1s in these two files. Would there be a way?
Furthermore, do these t2, t3 etc correspond to false positives? Because I noticed that very often they are created because Braker recognizes a start codon within the gene without a stop codon beforehand?
Thank you very much for your help.
Sincerely Emma