Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
120 stars 4 forks source link

In transcript g86.t1 two UTR/CDS features are overlapping. Not allowed by definition. at ~/software/Augustus/scripts/gtf2gff.pl line 182, <STDIN> line 759036. #38

Open wangjie07070910 opened 11 months ago

wangjie07070910 commented 11 months ago

Hello, I am trying this program. My commands are as follows: galba.pl --genome=${genome_file} --prot_seq=${protein_file} --threads 40

My error is as follows: ERROR in file ~/software/GALBA/scripts/galba.pl at line 5340 Failed to execute: cat augustus.hints.gff | perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | perl ~/software/Augustus/scripts/gtf2gff.pl --printExon --out=augustus.hints.tmp.gtf 2> errors/gtf2gff.augustus.hints.gtf.stderr

And the gtf2gff.augustus.hints.gtf.stderr shows: In transcript g86.t1 two UTR/CDS features are overlapping. Not allowed by definition. at ~/software/Augustus/scripts/gtf2gff.pl line 182, line 759036.

KatharinaHoff commented 11 months ago

Could you provide your augustus.hints.gff ? (send me a link or the file via email to katharina.hoff at uni-greifswald.de ). I will look into it, then.

KatharinaHoff commented 8 months ago

I hope this problem is solved by commit https://github.com/Gaius-Augustus/GALBA/commit/d8aaf4b93738da1a9afaa1035dfbd97bd18e9227

kullrich commented 5 months ago

Hi, I have the same problem with the latest galba.sif file.

less errors/gtf2gff.augustus.hints.gtf.stderr
In transcript g706.t1 two UTR/CDS features are overlapping. Not allowed by definition. at /opt/Augustus/scripts/gtf2gff.pl line 182, <STDIN> line 128068.

Is there a work around?

Thank you in anticipation

Best regards

Kristian

kleinjoel commented 3 months ago

Hi all,

I also run into the same issue for 2 of my genomes that I tried to annotate using GALBA: In transcript g411.t1 two UTR/CDS features are overlapping. Not allowed by definition. at /opt/Augustus/scripts/gtf2gff.pl line 182, line 574857.

For 4 other genomes it runs just fine with exactly the same settings and protein input. I'm also wondering if there is a work around eg. removing the offending transcript from the augustus.hints.gff file.

Best Regards,

Joel

KatharinaHoff commented 3 months ago

Did you pull the singularity image within the last 3 months?

kleinjoel commented 3 months ago

Hi Katharina, Thanks for your quick reply I double checked and got this information on the build: $ singularity inspect --labels galba.sif org.label-schema.build-arch: amd64 org.label-schema.build-date: Wednesday_15_May_2024_13:19:23_CEST org.label-schema.schema-version: 1.0 org.label-schema.usage.singularity.deffile.bootstrap: docker org.label-schema.usage.singularity.deffile.from: katharinahoff/galba-notebook:latest org.label-schema.usage.singularity.version: 3.8.3

KatharinaHoff commented 3 months ago

Ok, then it’s an open problem. Thank you for clarifying!

Joel Klein @.***> schrieb am Fr. 24. Mai 2024 um 17:15:

Hi Katharina, Thanks for your quick reply I double checked and got this information on the build: $ singularity inspect --labels galba.sif org.label-schema.build-arch: amd64 org.label-schema.build-date: Wednesday_15_May_2024_13:19:23_CEST org.label-schema.schema-version: 1.0 org.label-schema.usage.singularity.deffile.bootstrap: docker org.label-schema.usage.singularity.deffile.from: katharinahoff/galba-notebook:latest org.label-schema.usage.singularity.version: 3.8.3

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/38#issuecomment-2129793750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JCAPB5AYWR7JME356DZD5KQ7AVCNFSM6AAAAAA4LRCGWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRZG44TGNZVGA . You are receiving this because you modified the open/close state.Message ID: @.***>

kleinjoel commented 3 months ago

Dear Katharina,

Thanks for looking into it, if it helps I located the offending gene in the augustus.hints.gff file and copied the information of the 2 adjacent genes as well.

# start gene g410
CWNJ01000582    AUGUSTUS    gene    76  988 0.44    +   .   g410
CWNJ01000582    AUGUSTUS    transcript  76  988 0.44    +   .   g410.t1
CWNJ01000582    AUGUSTUS    start_codon 76  78  .   +   0   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    initial 76  389 0.48    +   0   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    internal    674 829 0.8 +   1   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    terminal    937 988 0.81    +   1   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    intron  390 673 0.85    +   .   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    intron  830 936 0.8 +   .   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    CDS 76  389 0.48    +   0   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    CDS 674 829 0.8 +   1   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    CDS 937 985 0.81    +   1   transcript_id "g410.t1"; gene_id "g410";
CWNJ01000582    AUGUSTUS    stop_codon  986 988 .   +   0   transcript_id "g410.t1"; gene_id "g410";
# coding sequence = [atgggttgttcatttgcagatggaatatacatgatggaagttgaccgcattctaagacctggtggttattgggtgcttt
# cgggtcctcctattggttggaaggttcattacaaagcctggcagcgatctaaggaggaccttcaggaagaacagaataagattgaagagactgctaag
# ctcctttgctgggagaaggtctctgagaagaatgaaattgccatttggcaaaagagggtagactctgtttcatgtcgtcgtagacaaatagattccag
# tgtaaaattctgcaaatcaagggatgttgatgatgtctggtataagaaaatggaggcctgcattactcctggtcctaaaggttctggtcataatctga
# aaccttttccagagaggctatatgcaatccctcctagaattgctagtggctctgctcctggagtttctgtggagacataccaggatgacaacaagaac
# tattcaatctcccaagttatgggtcatgaatgttgtgccaactattgctga]
# protein sequence = [MGCSFADGIYMMEVDRILRPGGYWVLSGPPIGWKVHYKAWQRSKEDLQEEQNKIEETAKLLCWEKVSEKNEIAIWQKR
# VDSVSCRRRQIDSSVKFCKSRDVDDVWYKKMEACITPGPKGSGHNLKPFPERLYAIPPRIASGSAPGVSVETYQDDNKNYSISQVMGHECCANYC]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/3
# CDS introns: 0/2
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 1
#     RM:   1 
# end gene g410
# start gene g411
CWNJ01000583    AUGUSTUS    gene    1   504 0.56    +   .   g411
CWNJ01000583    AUGUSTUS    transcript  1   504 0.56    +   .   g411.t1
CWNJ01000583    AUGUSTUS    terminal    1   504 0.56    +   0   transcript_id "g411.t1"; gene_id "g411";
CWNJ01000583    AUGUSTUS    CDS 1   501 0.56    +   0   transcript_id "g411.t1"; gene_id "g411";
CWNJ01000583    AUGUSTUS    stop_codon  502 504 .   +   0   transcript_id "g411.t1"; gene_id "g411";
# coding sequence = [acaagtgaagctgtgaatgcatactattcagctgctttgatgggtatgtcatatggtgacagagaccttgttgcaattg
# gatcaacactgttagcattggaaatgaaagcagcacaaacatggtggcatgtgaaagatggggacagtaacatgtatggaaaagacttcacaaaggaa
# aacagaatagtgggaatcctgtgggctaacaagagagatagtgcactatggtgggcctcagctgagtgcagagagtgtaggcttagcattcagctatt
# gcctttgttgcctatttctgaagaactattttctaatgtggagtatgtgaagaagcttgtggaatggacagagcctgctactgaagaaggatggaagg
# gatttttgtatgcattggaagggatttatgataaagaggatgctttggagaagatcagaaagttgacagaatttgatgatggaaactcattcacaaat
# ctcttgtggtggattcatagcagagggggttga]
# protein sequence = [TSEAVNAYYSAALMGMSYGDRDLVAIGSTLLALEMKAAQTWWHVKDGDSNMYGKDFTKENRIVGILWANKRDSALWWA
# SAECRECRLSIQLLPLLPISEELFSNVEYVKKLVEWTEPATEEGWKGFLYALEGIYDKEDALEKIRKLTEFDDGNSFTNLLWWIHSRGG]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/1
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 1
#     RM:   1 
# end gene g411
# start gene g412
CWNJ01000584    AUGUSTUS    gene    665 1711    2.92    -   .   g412
CWNJ01000584    AUGUSTUS    transcript  665 1711    1   -   .   g412.t1
CWNJ01000584    AUGUSTUS    stop_codon  665 667 .   -   0   transcript_id "g412.t1"; gene_id "g412";
CWNJ01000584    AUGUSTUS    terminal    665 1711    1   -   0   transcript_id "g412.t1"; gene_id "g412";
CWNJ01000584    AUGUSTUS    CDS 668 1711    1   -   0   transcript_id "g412.t1"; gene_id "g412";
# coding sequence = [tgcagctatggcggccacataatgccacgcccacatgataagtgtctctgctatgtcggcggcgacacccgaatccttg
# tcgttgatcggcattcctctctcaaagacctttgttcacgtctgtcttgtaccctcctccatggaaggcccttcaacctcaagtaccagctacccaat
# gaagatctcgacaatctgatatcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaaaacc
# tgcacgtttgaggctttttctattcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttcgtgg
# atgctcttaacaactctgggattctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtgattct
# agcaacaatttggaggctcaggctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcacctat
# ggtggagaacagttcctcatacggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagtaggc
# tgcagcaagagcagaggcctgggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtactttgtct
# gctcctatgccatcaattcctactacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacgagag
# attagatcagggagcacctgctggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggtggag
# gctttggagctggtggcggttttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [CSYGGHIMPRPHDKCLCYVGGDTRILVVDRHSSLKDLCSRLSCTLLHGRPFNLKYQLPNEDLDNLISVSTDEDLDNMI
# EEHDRITAAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSATVGCLVNLDGVLASDSSNNLEAQAAESLADNTKQD
# KNLPDVHSMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTFGANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIV
# AGDNMNRVISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGGGAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 1
#      C:   1 (250025_250025)
# incompatible hint groups: 5
#      C:   1 (637779_637779)
#      P:   3 
#     RM:   1 
CWNJ01000584    AUGUSTUS    transcript  665 1519    0.99    -   .   g412.t2
CWNJ01000584    AUGUSTUS    stop_codon  665 667 .   -   0   transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584    AUGUSTUS    single  665 1519    0.99    -   0   transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584    AUGUSTUS    CDS 668 1519    0.99    -   0   transcript_id "g412.t2"; gene_id "g412";
CWNJ01000584    AUGUSTUS    start_codon 1517    1519    .   -   0   transcript_id "g412.t2"; gene_id "g412";
# coding sequence = [ctgatatcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaa
# aacctgcacgtttgaggctttttctattcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttc
# gtggatgctcttaacaactctgggattctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtga
# ttctagcaacaatttggaggctcaggctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcac
# ctatggtggagaacagttcctcatacggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagt
# aggctgcagcaagagcagaggcctgggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtacttt
# gtctgctcctatgccatcaattcctactacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacg
# agagattagatcagggagcacctgctggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggt
# ggaggctttggagctggtggcggttttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [LISVSTDEDLDNMIEEHDRITAAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSAT
# VGCLVNLDGVLASDSSNNLEAQAAESLADNTKQDKNLPDVHSMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTF
# GANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIVAGDNMNRVISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGG
# GAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 4
#      C:   2 (250025_250025,637779_637779)
#      P:   2 
CWNJ01000584    AUGUSTUS    transcript  665 1690    0.93    -   .   g412.t3
CWNJ01000584    AUGUSTUS    stop_codon  665 667 .   -   0   transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584    AUGUSTUS    single  665 1690    0.93    -   0   transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584    AUGUSTUS    CDS 668 1690    0.93    -   0   transcript_id "g412.t3"; gene_id "g412";
CWNJ01000584    AUGUSTUS    start_codon 1688    1690    .   -   0   transcript_id "g412.t3"; gene_id "g412";
# coding sequence = [atgccacgcccacatgataagtgtctctgctatgtcggcggcgacacccgaatccttgtcgttgatcggcattcctctc
# tcaaagacctttgttcacgtctgtcttgtaccctcctccatggaaggcccttcaacctcaagtaccagctacccaatgaagatctcgacaatctgata
# tcagtttccaccgatgaagaccttgacaacatgattgaggagcatgatcgcatcactgcagctcatcctttaaaacctgcacgtttgaggctttttct
# attcttcgataagccagagactgcagtttcaatgggttctcttttggatgattcaaagtctgaaacttggttcgtggatgctcttaacaactctggga
# ttctcccaagggttgtttcagattctgccacagtgggttgtttggtgaaccttgatggagttcttgctagtgattctagcaacaatttggaggctcag
# gctgctgagtctctggctgataacactaaacaagataagaatttgcctgatgtgcattcaatgccaaactcacctatggtggagaacagttcctcata
# cggatcatcttcttcaaatccttcgatggccaatctgcctccaatgcggggtcgcgtcgacgagaatggtagtaggctgcagcaagagcagaggcctg
# ggatggaagagcagtttgctcaaatgacctttggtgcgaatgtgatgaaacaagatgatgggtatggtactttgtctgctcctatgccatcaattcct
# actacagttgtgacaatggcatcaccagcaattgttgctggtgataacatgaatcgggttatctcggatgacgagagattagatcagggagcacctgc
# tggatatagaatgccgcctttgccattgctgcctgtgcaaccaaggactattagtggtggttttggcggaggtggaggctttggagctggtggcggtt
# ttagtgctggcagtggcgccggatttggtggtggagctggatatggagctggcggtggccagtga]
# protein sequence = [MPRPHDKCLCYVGGDTRILVVDRHSSLKDLCSRLSCTLLHGRPFNLKYQLPNEDLDNLISVSTDEDLDNMIEEHDRIT
# AAHPLKPARLRLFLFFDKPETAVSMGSLLDDSKSETWFVDALNNSGILPRVVSDSATVGCLVNLDGVLASDSSNNLEAQAAESLADNTKQDKNLPDVH
# SMPNSPMVENSSSYGSSSSNPSMANLPPMRGRVDENGSRLQQEQRPGMEEQFAQMTFGANVMKQDDGYGTLSAPMPSIPTTVVTMASPAIVAGDNMNR
# VISDDERLDQGAPAGYRMPPLPLLPVQPRTISGGFGGGGGFGAGGGFSAGSGAGFGGGAGYGAGGGQ]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 100
# CDS exons: 1/1
#      C:   1 
# CDS introns: 0/0
# 5'UTR exons and introns: 0/0
# 3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 5
#      C:   2 (250025_250025,637779_637779)
#      P:   3 
# end gene g412