Closed GwynHN closed 1 year ago
Hi,
TSEBRA groups all transcripts into the same gene that have overlapping coding regions in the same open reading frame.
Without more information, I would assume that in your case these transcripts aren't in the same gene (at least in these terms). If you want to ignore the frame, you can use the new --ignore_tx_phase
option.
Best, Lars
Hi Lars,
Ok, I see! In the one example I gave, the two original transcripts had the same start and stop positions for the gene and transcript features, but different start codons annotated. This is similar to issue #26.
Thanks! Gwyneth
Hi @LarsGab
Is there a way we can have the --ignore_tx_phase
option in the long_reads branch as well?
Hi TSEBRA developers,
Thanks for the great tools, I am getting some really nice results.
After I combined the RNA and protein evidence, I've noticed some transcripts that had the same gene ID but labeled as different transcripts in the original annotation are then labeled with different gene IDs after being combined. For example, anno1.g23605.t1 and anno1.g23605.t2 become g_22925 and g_22926 in the combined GTF. There are several examples of this, but it doesn't happen all the time and I haven't been able to see a pattern.
With the repo I cloned back in November 2022 (v1.0.3), I ran the following:
bin/tsebra.py -g RNA/augustus.hints.gtf,Protein/augustus.hints.gtf -c config/default.cfg -e RNA/hintsfile.gff,Protein/hintsfile.gff -o rna_prot_combined.gtf
Best, Gwyneth