Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

Inclusion of stop_codon in the CDS interval? #833

Open gbdias opened 1 month ago

gbdias commented 1 month ago

Hi,

e.g. g119.t2 extends the CDS to include the stop codon, while g119.t1 does not.

Screenshot 2024-05-28 at 09 29 07

scaf1  AUGUSTUS        gene    289341  290015  .       -       .       ID=g119;
scaf1  AUGUSTUS        mRNA    289341  290015  1       -       .       ID=g119.t1;Parent=g119;
scaf1  AUGUSTUS        stop_codon      289341  289343  .       -       0       ID=g119.t1.stop1;Parent=g119.t1;
scaf1  AUGUSTUS        CDS     289344  290015  1       -       0       ID=g119.t1.CDS1;Parent=g119.t1;
scaf1  AUGUSTUS        exon    289344  290015  .       -       .       ID=g119.t1.exon1;Parent=g119.t1;
scaf1  AUGUSTUS        start_codon     290013  290015  .       -       0       ID=g119.t1.start1;Parent=g119.t1;
scaf1  GeneMark.hmm3   mRNA    289341  290015  .       -       .       ID=g119.t2;Parent=g119;
scaf1  GeneMark.hmm3   CDS     289341  290015  .       -       0       ID=g119.t2.CDS1;Parent=g119.t2;
scaf1  GeneMark.hmm3   exon    289341  290015  .       -       0       ID=g119.t2.exon1;Parent=g119.t2;

Thanks

KatharinaHoff commented 1 month ago

Thank you for pointing that out. It will take some time until we fix this, but it will definitely be fixed.

On Tue, May 28, 2024 at 9:38 AM Guilherme Borges Dias < @.***> wrote:

Hi,

-

I have noticed that when urn on protein mode (BRAKER v3.0.3) will often have two identical transcripts for many genes. One created by AUGUSTUS, and one created by Genemark.hmm3.

The only difference between these transcripts seems to be the inclusion (Genemark.hmm3) or not (AUGUSTUS) of the stop_codon in the CDS interval.

e.g. g119.t2 extends the CDS to include the stop codon, while g119.t1 does not.

Screenshot.2024-05-28.at.09.29.07.png (view on web) https://github.com/Gaius-Augustus/BRAKER/assets/7614153/ff9ce941-c2f0-4dff-913b-b3c4813737b9

scaf1 AUGUSTUS gene 289341 290015 . - . ID=g119; scaf1 AUGUSTUS mRNA 289341 290015 1 - . ID=g119.t1;Parent=g119; scaf1 AUGUSTUS stop_codon 289341 289343 . - 0 ID=g119.t1.stop1;Parent=g119.t1; scaf1 AUGUSTUS CDS 289344 290015 1 - 0 ID=g119.t1.CDS1;Parent=g119.t1; scaf1 AUGUSTUS exon 289344 290015 . - . ID=g119.t1.exon1;Parent=g119.t1; scaf1 AUGUSTUS start_codon 290013 290015 . - 0 ID=g119.t1.start1;Parent=g119.t1; scaf1 GeneMark.hmm3 mRNA 289341 290015 . - . ID=g119.t2;Parent=g119; scaf1 GeneMark.hmm3 CDS 289341 290015 . - 0 ID=g119.t2.CDS1;Parent=g119.t2; scaf1 GeneMark.hmm3 exon 289341 290015 . - 0 ID=g119.t2.exon1;Parent=g119.t2;

  • Since these have the exact same structure they're not really isoforms, do you recommend keeping just the AUGUSTUS one?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JDHWUTLJCARNSA5IKTZEQYAJAVCNFSM6AAAAABIMLBW46VHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDAMZVGM3TIMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>