Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
46 stars 5 forks source link

Tsebra gtf2aa.pl predicting proteins with in-frame stop codons #24

Closed nllg closed 10 months ago

nllg commented 1 year ago

Hello all,

I have managed to successfully run both braker_RNA and braker_proteins as part of my structure annotation pipeline. I would like to now translate the output of TSEBRA (tesbra.gtf) to the standard gff3, protein and cds files.

These are the steps that I run: rename_gtf.py --gtf tsebra.gtf --out tsebra_renamed.gtf gtf2gff.pl < tsebra_renamed.gtf --out=tsebra_renamed.gff3 --gff3 --printExon getAnnoFasta.pl tsebra_renamed.gtf --seqfile=genome.fa --chop_cds gtf2aa.pl genome.fa tsebra_renamed.gtf tsebra_renamed.aa

But I am encountering the problem that proteins predicted contain In-Frame Stop Codons, which is something that we do not want.

Could you please help me and tell me how can I fix this?

Thank you,

Nathaly

LarsGab commented 1 year ago

Hi,

you can remove proteins with In-Frame stop codons using filterInFrameStopCodons.pl from the AUGUSTUS scripts.

Best, Lars