genometools / genometools

GenomeTools genome analysis system.
http://genometools.org
Other
294 stars 65 forks source link

Aborted (core dumped) with LTR digest #995

Open omar-almolla209 opened 2 years ago

omar-almolla209 commented 2 years ago

Problem description

While using LTRdigest this error always pops up (which also appears in R studio using ltr digest via the LTRpred package)

This is a bug, please report it at https://github.com/genometools/genometools/issues Please make sure you are running the latest release which can be found at http://genometools.org/pub/ You can check your version number with gt -version. Aborted (core dumped)

Exact command line call triggering the problem

#PATH:
proteins="/home/omar-almulla/Downloads/"
genome="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/genomes/"
gff3="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUT/EDTA_outputs/20-WGS-PCE.2.0/20-WGS-PCE.2.0_shortIDs.fasta.mod.EDTA.raw/LTR/"

gt ltrdigest -hmms $proteins/Pfam-A.hmm -aaout -outfileprefix ltrs_sorted -seqfile $genome/20-WGS-PCE.2.0_shortIDs.fasta -matchdescstart < $gff3/LTR/ltrs_sorted.gff3 > ltrdigest.gff3

What GenomeTools version are you reporting an issue for (as output by gt -version)?

gt (GenomeTools) 1.6.2 Copyright (c) 2003-2016 G. Gremme, S. Steinbiss, S. Kurtz, and CONTRIBUTORS Copyright (c) 2003-2016 Center for Bioinformatics, University of Hamburg See LICENSE file or http://genometools.org/license.html for license details.

Used compiler: cc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Compile flags: -g -Wall -Wunused-parameter -pipe -fPIC -Wpointer-arith -Wno-unknown-pragmas -O3 -Werror

What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?

Ubuntu 20.04

satta commented 2 years ago

Are you sure there are no more lines before the "This is a bug" line? I would need those to locate the issue, as they describe the error context. Also, would you be OK with sharing some of your input files to help reproduce the problem? Thanks!

omar-almolla209 commented 2 years ago

Thanks for the reply. To date I have solved the problem by giving ltr digest the output obtained from the EDTA 1.6 version. The above error appeared only when using the EDTA version 1.9 outputs. Unfortunately, I am not allowed to make the input files public as they are in the process of being published. Anyway with these changes everything is ok:

`tRNAs="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/Hmm_trna" proteins="/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/Hmm_trna" genome='/home/omar-almulla/Desktop/Prunus_TE_project/INPUT/genomes/Prunus_avium_NCBI' EDTA_output_1_6_path="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUTS/EDTA_output/Prunus_avium_NCBI/EDTA_1.6_output/Prunus_avium_NCBI_genomic.fna.EDTA.raw" output="/home/omar-almulla/Desktop/Prunus_TE_project/OUTPUTS/LTRdigest_output/Prunus_avium_NCBI"

gt -j 4 ltrdigest -outfileprefix Prunus_avium_NCBI_ltr -trnas $tRNAs/plants-tRNAcat.fa -hmms $proteins/hmm* -seqfile $genome/Prunus_avium_NCBI_genomic.fna -matchdescstart $EDTA_output_1_6_path/Prunus_avium_NCBI_genomic.fna.LTR.intact.faSORTED.1.6.gff3 > $output/Prunus_avium_NCBI_digest.gff `

satta commented 2 years ago

I see. I'll keep this one open but can not do much without the test data. I am unfortunately not familiar with EDTA or LTRpred but perhaps that tool creates weird GFF3 structure?

Anyway, could you please still share the line you got before the "this is a bug, please report" line, if that's OK for you? It should contain something like "Assertion failed: ..." and would at least help us place the error somewhere, and also make this issue searchable for others with a similar problem.

omar-almolla209 commented 2 years ago

My script:

gt -j 4 ltrdigest -outfileprefix Prunus_avium_ltr -trnas ./INPUT/Hmm_trna/plants-tRNA_cat.fa -hmms ./INPUT/Hmm_trna/hmm_* -seqfile ./INPUT/genomes/Prunus_avium_NCBI/Prunus_avium_NCBI.fna -matchdescstart ./OUTPUTS/EDTA_output/Prunus_avium_NCBI/EDTA_1.9_output/Prunus_avium_NCBI.fna.mod.EDTA.raw/*SORTED.gff3 > Prunus_avium_digest.gff

I could not replicate the same error. Now appear:

Segmentation fault (core dumped)

omar-almolla209 commented 2 years ago

gff-version 3

sequence-region CM024352.1 1 62324707

sequence-region CM024353.1 1 46928806

sequence-region CM024354.1 1 42862123

sequence-region CM024355.1 1 37373756

sequence-region CM024356.1 1 41299679

sequence-region CM024357.1 1 42624765

sequence-region CM024358.1 1 30632009

sequence-region CM024359.1 1 38835769

sequence-region JAAOZG010000014 1 51232

sequence-region JAAOZG010000020 1 36342

sequence-region JAAOZG010000023 1 31182

sequence-region JAAOZG010000027 1 27350

sequence-region JAAOZG010000035 1 22413

sequence-region JAAOZG010000061 1 97395

CM024352.1 EDTA repeat_region 191737 201094 . ? . ID=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000657;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA target_site_duplication 191737 191741 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA long_terminal_repeat 191742 193449 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA LTR_retrotransposon 191742 201089 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000186;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA long_terminal_repeat 199383 201089 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT CM024352.1 EDTA target_site_duplication 201090 201094 . ? . Parent=repeat_region1;name=CM024352.1:191742..201089;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9959;mathod=structural;motif=TGCA;tsd=TCCAT

CM024352.1 EDTA repeat_region 1617430 1629426 . ? . ID=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000657;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA target_site_duplication 1617430 1617434 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000434;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA long_terminal_repeat 1617435 1619599 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000286;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA Gypsy_LTR_retrotransposon 1617435 1629421 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0002265;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA long_terminal_repeat 1627258 1629421 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000286;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT CM024352.1 EDTA target_site_duplication 1629422 1629426 . ? . Parent=repeat_region2;name=CM024352.1:1617435..1629421;classification=LTR/Gypsy;sequence_ontology=SO:0000434;ltr_identity=1.0000;mathod=structural;motif=TGCA;tsd=CCAAT

CM024352.1 EDTA repeat_region 1946186 1956558 . ? . ID=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000657;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA target_site_duplication 1946186 1946190 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA long_terminal_repeat 1946191 1948386 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA LTR_retrotransposon 1946191 1956553 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000186;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA long_terminal_repeat 1954358 1956553 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000286;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT CM024352.1 EDTA target_site_duplication 1956554 1956558 . ? . Parent=repeat_region3;name=CM024352.1:1946191..1956553;classification=LTR/unknown;sequence_ontology=SO:0000434;ltr_identity=0.9991;mathod=structural;motif=TGCA;tsd=GTAAT

satta commented 2 years ago

I am afraid the GFF3 file is not enough for me to replicate the issue, I would also need the other files (sequence FASTA and tRNA files). Basically I need a way to trigger the error on my side with your command line call. Thanks!