Closed yweii closed 1 year ago
Hi,
thanks for using TSEBRA.
I tried using augustus_GTF_to_EVM_GFF3.pl
from EVM and in my case, it seems to convert the TSEBRA output correctly.
Could you give me some more information about the issue with this script, so I can reproduce your problem?
Best, Lars
Hi Lars,
I would like to ask if is it normal that the other gene structures (intron, start and stop codons) is removed from the GTF output of Tsebra when converted to EVM GFF3 format? That seems to be the case for me. Attached are the Tsebra output files (both GTF and converted GFF) for reference. Hoping you could me with this matter.
GTF format
EVM converted GFF format (augustus_GTF_to_EVM_GFF3.pl)
Thanks, Zae
augustus_GTF_toEVM
@LarsGab it doesn't seems to work in my case. I converted tsebra generated gtf file to gff3 using augustus_GTF_to_EVM_GFF3.pl, but when validating using gff3_gene_prediction_file_validator.pl it gives me following error; Any suggestions?
Regards, B
Error, feature: Chr1-g_638044 is described multiple times with different data values:
$VAR1 = {
'parent_ID' => undef,
'rend' => '592671708',
'contig' => 'Chr1',
'feature_ID' => 'Chr1-g_638044',
'feat_type' => 'gene',
'orient' => '-',
'lend' => '592671373'
INPUT example:
Chr1 Augustus gene 592671373 592671693 . - . ID=Chr1-g_638044;Name=Augustus%20prediction
Chr1 Augustus mRNA 592671373 592671693 . - . ID=Chr1-anno2.g154521.t1;Parent=Chr1-g_638044;Name=Augustus%20prediction
Chr1 Augustus exon 592671373 592671693 . - . ID=Chr1-anno2.g154521.t1.exon1;Parent=Chr1-anno2.g154521.t1
Chr1 Augustus CDS 592671373 592671693 . - . ID=cds.Chr1-anno2.g154521.t1;Parent=Chr1-anno2.g154521.t1
Chr1 Augustus exon 592671373 592671375 . - . ID=Chr1-anno2.g154521.t1.exon2;Parent=Chr1-anno2.g154521.t1
Chr1 Augustus CDS 592671373 592671375 . - . ID=cds.Chr1-anno2.g154521.t1;Parent=Chr1-anno2.g154521.t1
Hi,
I'm not sure what the requirements for EVM are and it might be to be a problem on EVM's end. If it is a problem with the naming convention of the transcript/gene IDs, you can try to first rename all IDs using rename_gtf.py. TSEBRA can also report multiple transcript isoforms per gene. This might also cause problems for EVM, if I remember correctly.
Best, Lars
Thanks for this great tool and look forward to your reply. Running TSEBRA (braker1:rna-seq; braker2:proteins), as it should, I got braker1+2_combined.gtf. However, I tried various methods, including 1, use rename_gtf.py/gtf2gff.pl//add_name_to_gff3.pl/augustus_GFF3_to_EVM_GFF3.pl (such as : Error, feature: Chr10-g_40435 is described multiple times with different data values). 2, use augustus_GTF_to_EVM_GFF3.pl from EVM.
The result does not work. Do you have any good methods?