conflicting parents with gtf2gff.pl

conchoecia commented 3 years ago

Hello,

I have run the TSEBRA pipeline like this, using one run from ProtHint+Braker, then another run from RNA-seq data + BRAKER. I ran this same routine from the README.md.

./bin/fix_gtf_ids.py --gtf braker1_out/braker.gtf --out braker1_fixed.gtf
./bin/fix_gtf_ids.py --gtf braker2_out/braker.gtf --out braker2_fixed.gtf

./bin/tsebra.py -g braker1_fixed.gtf,braker2_fixed.gtf -c default.cfg \
    -e braker1_out/hintsfile.gff,braker2_out/hintsfile.gff \
    -o braker1+2_combined.gtf

I ran the Augustus gtf2gff.pl script on braker1+2_combined.gtf and got this error:

transcript anno1.sca1+_file_1_file_1_g35260.t1 has conflicting gene parents: _file_1_file_1_g35260 and g_10310. Remember: In GTF txids need to be overall unique.

Seems like the same issue as https://github.com/Gaius-Augustus/Augustus/issues/31 ? Let me know what files would be the most helpful to upload, if you are interested in this problem.

Thank you - Darrin

LarsGab commented 3 years ago

Hello Darrin,

I looked at the script gtf2gff.pl and it seems that it assumes that the gene ID of each transcript is the ending of its transcript ID. This isn't the case in the normal output of TSEBRA. You can work around this issue by using rename_gtf.py from the TSEBRA repository to rename the transcript and gene IDs from the TSEBRA output and then use gtf2gff.pl on the renamed gene set. For example with:

rename_gtf.py --gtf braker1+2_combined.gtf --out braker1+2_combined_renamed.gtf
gtf2gff.pl < braker1+2_combined_renamed.gtf --out braker1+2_combined_renamed.gff

However, you can't trace the renamed IDs back to the IDs of the BRAKER outputs. If this is important for you, rename_gtf.py can create a translation table for BRAKER IDs to the renamed IDs (use --translation_tab option).

I hope this helps. Best, Lars

conchoecia commented 3 years ago

Hi Lars,

That is helpful, thank you! The --translation_tab option is a nice bridge between the old names and the new. This should help in getting a GTF and I will reopen if I encounter other issues.

Darrin

Gaius-Augustus / TSEBRA

conflicting parents with gtf2gff.pl #9