Closed conchoecia closed 3 years ago
Hello Darrin,
I looked at the script gtf2gff.pl
and it seems that it assumes that the gene ID of each transcript is the ending of its transcript ID. This isn't the case in the normal output of TSEBRA. You can work around this issue by using rename_gtf.py
from the TSEBRA repository to rename the transcript and gene IDs from the TSEBRA output and then use gtf2gff.pl
on the renamed gene set. For example with:
rename_gtf.py --gtf braker1+2_combined.gtf --out braker1+2_combined_renamed.gtf
gtf2gff.pl < braker1+2_combined_renamed.gtf --out braker1+2_combined_renamed.gff
However, you can't trace the renamed IDs back to the IDs of the BRAKER outputs. If this is important for you, rename_gtf.py
can create a translation table for BRAKER IDs to the renamed IDs (use --translation_tab
option).
I hope this helps. Best, Lars
Hi Lars,
That is helpful, thank you! The --translation_tab
option is a nice bridge between the old names and the new. This should help in getting a GTF and I will reopen if I encounter other issues.
Darrin
Hello,
I have run the TSEBRA pipeline like this, using one run from ProtHint+Braker, then another run from RNA-seq data + BRAKER. I ran this same routine from the README.md.
I ran the Augustus
gtf2gff.pl
script onbraker1+2_combined.gtf
and got this error:transcript anno1.sca1+_file_1_file_1_g35260.t1 has conflicting gene parents: _file_1_file_1_g35260 and g_10310. Remember: In GTF txids need to be overall unique.
Seems like the same issue as https://github.com/Gaius-Augustus/Augustus/issues/31 ? Let me know what files would be the most helpful to upload, if you are interested in this problem.
Thank you - Darrin