Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
46 stars 5 forks source link

Contained Gene bug #26

Closed nhartwic closed 1 year ago

nhartwic commented 1 year ago

I'm currently experimenting with tsebra and have noticed a strange output. Basically, I ran braker with a protein database and braker with rnaseq and then ran tsebra. Output mostly looks good. There are a lot of TE related genes (or at least genes/CDS with significant overlap with my softmask). But there also seems to just be a bug.

Basically, I have two gene models in the same strand with the coordinates essentially contained within the other. Images below...

image

  1. top track is my tsebra output after converting to gff3
  2. second track is the raw output from tsebra in native gtf format
  3. third track is braker with rnaseq
  4. last track is braker with proteins

The thin strands for the gff3 represent "gene" features. For gtf, I need to hover to see gene ids but the tsebra gtf and gff3 are consistent. Relevant portions of gtfs below...

# tsebra
chr_8h  AUGUSTUS        gene    64993239        64996898        .       -       .       g_9280
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64993239        64994892        0.77    -       1       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64993239        64994892        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        intron  64994893        64995240        0.77    -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64995241        64995328        0.95    -       2       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64995241        64995328        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        intron  64995329        64996558        0.94    -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64996559        64996898        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64996559        64996898        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        start_codon     64996896        64996898        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        gene    64993239        64994933        .       -       .       g_2084
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994333        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994333        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994331        64994333        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64993535        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64993535        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64993533        64993535        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64993952        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64993952        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64993950        64993952        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994132        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994132        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994130        64994132        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994615        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994615        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994613        64994615        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.84    -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";

# braker with proteins
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994333        1       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994333        .       -       .       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994331        64994333        .       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994333        1       -       .       g33709.t4
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64993535        1       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64993535        .       -       .       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64993533        64993535        .       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64993535        1       -       .       g33709.t3
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994892        0.77    -       1       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994892        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        intron  64994893        64995240        0.77    -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64995241        64995328        0.95    -       2       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64995241        64995328        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        intron  64995329        64996558        0.94    -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64996559        64996898        1       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        gene    64993239        64996898        6.6     -       .       g33709
chr_8h  AUGUSTUS        transcript      64993239        64996898        0.76    -       .       g33709.t6
chr_8h  AUGUSTUS        exon    64996559        64996898        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64996896        64996898        .       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994132        1       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994132        .       -       .       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994130        64994132        .       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994132        1       -       .       g33709.t1
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994615        1       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994615        .       -       .       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994613        64994615        .       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994615        1       -       .       g33709.t5
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.84    -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994933        0.84    -       .       g33709.t7
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64993952        1       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64993952        .       -       .       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64993950        64993952        .       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64993952        1       -       .       g33709.t2

# braker with rnaseq
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.65    -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        gene    64993239        64994933        0.65    -       .       g34062
chr_8h  AUGUSTUS        transcript      64993239        64994933        0.65    -       .       g34062.t1

In terms of software, I used the current braker package from conda after manually installing genemark. And I'm using the latest (as of a couple days ago anyway) version of tsebra-main.

I'm writing a script to fix this in the pipeline I'm writing, but I figured it was worth reporting the bug here too.

LarsGab commented 1 year ago

Hi,

TSEBRA considers all transcripts to be in one gene if they have overlapping coding regions in the same frame. I think this might be the problem here, as these transcripts are in different frames. I added a new option to TSEBRA (--ignore_tx_phase) to address this. With this option, TSEBRA ignores the frame of transcripts and in your case, it should include all transcript isoforms into one gene model.

Best, Lars

bijendrabio commented 1 year ago

I'm currently experimenting with tsebra and have noticed a strange output. Basically, I ran braker with a protein database and braker with rnaseq and then ran tsebra. Output mostly looks good. There are a lot of TE related genes (or at least genes/CDS with significant overlap with my softmask). But there also seems to just be a bug.

Basically, I have two gene models in the same strand with the coordinates essentially contained within the other. Images below...

image

  1. top track is my tsebra output after converting to gff3
  2. second track is the raw output from tsebra in native gtf format
  3. third track is braker with rnaseq
  4. last track is braker with proteins

The thin strands for the gff3 represent "gene" features. For gtf, I need to hover to see gene ids but the tsebra gtf and gff3 are consistent. Relevant portions of gtfs below...

# tsebra
chr_8h  AUGUSTUS        gene    64993239        64996898        .       -       .       g_9280
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64993239        64994892        0.77    -       1       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64993239        64994892        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        intron  64994893        64995240        0.77    -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64995241        64995328        0.95    -       2       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64995241        64995328        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        intron  64995329        64996558        0.94    -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        CDS     64996559        64996898        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        exon    64996559        64996898        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        start_codon     64996896        64996898        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t6"; gene_id "g_9280";
chr_8h  AUGUSTUS        gene    64993239        64994933        .       -       .       g_2084
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994333        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994333        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994331        64994333        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t4"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64993535        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64993535        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64993533        64993535        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t3"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64993952        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64993952        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64993950        64993952        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t2"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994132        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994132        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994130        64994132        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t1"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994615        1       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994615        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994613        64994615        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t5"; gene_id "g_2084";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.84    -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "anno1.file_1_file_1_g33709.t7"; gene_id "g_2084";

# braker with proteins
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994333        1       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994333        .       -       .       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994331        64994333        .       -       0       transcript_id "file_1_file_1_g33709.t4"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994333        1       -       .       g33709.t4
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64993535        1       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64993535        .       -       .       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64993533        64993535        .       -       0       transcript_id "file_1_file_1_g33709.t3"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64993535        1       -       .       g33709.t3
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994892        0.77    -       1       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994892        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        intron  64994893        64995240        0.77    -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64995241        64995328        0.95    -       2       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64995241        64995328        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        intron  64995329        64996558        0.94    -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64996559        64996898        1       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        gene    64993239        64996898        6.6     -       .       g33709
chr_8h  AUGUSTUS        transcript      64993239        64996898        0.76    -       .       g33709.t6
chr_8h  AUGUSTUS        exon    64996559        64996898        .       -       .       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64996896        64996898        .       -       0       transcript_id "file_1_file_1_g33709.t6"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994132        1       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994132        .       -       .       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994130        64994132        .       -       0       transcript_id "file_1_file_1_g33709.t1"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994132        1       -       .       g33709.t1
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994615        1       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994615        .       -       .       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994613        64994615        .       -       0       transcript_id "file_1_file_1_g33709.t5"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994615        1       -       .       g33709.t5
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.84    -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "file_1_file_1_g33709.t7"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64994933        0.84    -       .       g33709.t7
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        CDS     64993239        64993952        1       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        exon    64993239        64993952        .       -       .       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        start_codon     64993950        64993952        .       -       0       transcript_id "file_1_file_1_g33709.t2"; gene_id "file_1_file_1_g33709";
chr_8h  AUGUSTUS        transcript      64993239        64993952        1       -       .       g33709.t2

# braker with rnaseq
chr_8h  AUGUSTUS        stop_codon      64993239        64993241        .       -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        CDS     64993239        64994933        0.65    -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        exon    64993239        64994933        .       -       .       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        start_codon     64994931        64994933        .       -       0       transcript_id "file_1_file_1_g34062.t1"; gene_id "file_1_file_1_g34062";
chr_8h  AUGUSTUS        gene    64993239        64994933        0.65    -       .       g34062
chr_8h  AUGUSTUS        transcript      64993239        64994933        0.65    -       .       g34062.t1

In terms of software, I used the current braker package from conda after manually installing genemark. And I'm using the latest (as of a couple days ago anyway) version of tsebra-main.

I'm writing a script to fix this in the pipeline I'm writing, but I figured it was worth reporting the bug here too.

Curioius, why coordinates of CDS and exons are same?

nhartwic commented 1 year ago

I don't believe tsebra predicts UTR (or at least it doesn't do it by default) so exons and CDS should have the same coordinates. What were you expecting?