Gaius-Augustus / TSEBRA

TSEBRA: Transcript Selector for BRAKER
47 stars 5 forks source link

Naming is incorrect #3

Closed sarjopp closed 3 years ago

sarjopp commented 3 years ago

After running Braker1 and Braker2, then doing fix_gtf_ids.py on each braker.gtf, my output from tsebra.py has inconsistent names: CSAcor1 AUGUSTUS transcript 1 495 . + . anno1.CSAcor1+_file_1_file_1_g22683.t1 CSAcor1 AUGUSTUS start_codon 2 4 . + . transcript_id "anno2.CSAcor1+_file_1_file_1_g3798.t1"; gene_id "g_31366"; CSAcor1 AUGUSTUS CDS 2 495 0.73 + 2 transcript_id "anno2.CSAcor1+_file_1_file_1_g3798.t1"; gene_id "g_31366"; CSAcor1 AUGUSTUS exon 2 495 . + . transcript_id "anno2.CSAcor1+_file_1_file_1_g3798.t1"; gene_id "g_31366"; CSAcor1 AUGUSTUS stop_codon 493 495 . + 0 transcript_id "anno2.CSAcor1+_file_1_file_1_g3798.t1"; gene_id "g_31366";

As you can see, the transcript ID is "g3798.t1" while the gene ID is "g_31366."

In the original braker.gtf files, and in the fixed_braker.gtf files, the transcript and gene IDs match. The fact that the IDs are messed up makes me scared to trust any of the tsebra output.

LarsGab commented 3 years ago

Hi, I replied to this issue in the BRAKER repository and I'll close this one.