Inconsistencies in transcript_ID and gene_ID

minhasbushra commented 2 years ago

Hi,

I am having inconsistencies in geneID and transcript ID in braker gtf file. for example:

12 AUGUSTUS intron 32870989 32872655 1 + . transcript_id "anno1.12+_file_1_file_1_g20327.t1"; gene_id "g_5119"; 12 AUGUSTUS CDS 32872656 32872832 1 + 0 transcript_id "anno1.12+_file_1_file_1_g20327.t1"; gene_id "g_5119"; 12 AUGUSTUS exon 32872656 32872832 . + . transcript_id "anno1.12+_file_1_file_1_g20327.t1"; gene_id "g_5119"; 12 AUGUSTUS intron 32872833 32873082 1 + . transcript_id "anno1.12+_file_1_file_1_g20327.t1"; gene_id "g_5119";

I used the fix ID before running tsebra. Please let me know.

Thanks Bushra

LarsGab commented 2 years ago

Hi Bushra,

do you mean the inconsistency between the geneID 'part' of the transcript ID (g20327.t1) and the actual gene ID (g_5119)? The IDs don't match because we use transcript IDs in the TSEBRA output that match the IDs from the input gene set. This allows you to trace your transcripts from the TSEBRA output to the input. If you want matching transcript and gene IDs, you can use the rename_gtf.py script from this repository, e.g. with: rename_gtf.py --gtf tsebra_result.gtf --prefix Species1 --translation_tab translation.tab --out tsebra_result_renamed.gtf

Best, Lars

LarsGab commented 2 years ago

I close this issue since all questions have been answered or it has been inactive for a long time.

Gaius-Augustus / TSEBRA

Inconsistencies in transcript_ID and gene_ID #14