Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

How to keep only the t1? #800

Closed emmafg closed 2 months ago

emmafg commented 2 months ago

Hello,

thank you a lot for your help.

I have a question regarding the isoforms obtained by Braker3. Indeed in the output files I obtain many t2,t3,t4... I would like to know if there was an option that would tell Braker to only keep the longest one?

I tried with: ./bin/get_longest_isoform.py --gtf braker.gtf --out longest_insoforms.gtf

But this only modifies the braker.gtf file and not braker.codingseq and braker.aa. I would like to have only the t1s in these two files. Would there be a way?

Furthermore, do these t2, t3 etc correspond to false positives? Because I noticed that very often they are created because Braker recognizes a start codon within the gene without a stop codon beforehand?

Thank you very much for your help.

Sincerely Emma

KatharinaHoff commented 2 months ago

When you have the gtf file with your favorite isoforms, you can use the script getAnnoFastaFromJoingenes.py (part of Augustus) to generate the protein and codingseq file. Or you create a list of your favorite transcript names and use cdbfasta/cdbyank to extfact the sequences.

KatharinaHoff commented 2 months ago

Furthermore, do these t2, t3 etc correspond to false positives? Because I noticed that very often they are created because Braker recognizes a start codon within the gene without a stop codon beforehand?

These are possible alternative transcripts. All of them had evidence. Experts of one or the other particular gene may be able to judge whether one or the other isoform of such a gene is a false positive. We usually don't call them false positives.

emmafg commented 2 months ago

Hello Katharina,

Thank you for your valued assistance.

Emma