Open xiaoxiaonao opened 2 years ago
Hi, thanks for reporting this. To properly reproduce the error and determine the root cause, though, I need the input sequence you used (Ps_genome.part-05.fasta
). It would be great if you could provide this file, or, alternatively, a snippet of this sequence that triggers this issue without having to reveal too much of your input.
I also noticed that your value for maxlenltr
is very large (3000). Could you try to also adjust mindistltr
to account for that and to prevent overlapping LTRs? In this case it should be at least 3000. If that helps, then maybe LTRharvest should check this condition at the start.
Dear Sascha, Attached is the input file(Ps_genome.part-05.fasta.gz) . The file is over 2G when unzipped. ------------------ Original ------------------ From: @.>; Date: Tue, Jan 4, 2022 06:17 PM To: @.>; Cc: @.>; @.>; Subject: Re: [genometools/genometools] Aborted (core dumped) with LTR harvest (Issue #999)
Hi, thanks for reporting this. To properly reproduce the error and determine the root cause, I would need the input sequence you used (Ps_genome.part-05.fasta). It would be great if you could provide this file, or, alternatively, a snippet of this sequence that triggers this issue without having to reveal too much of your input.
I also noticed that your value for maxlenltr is very large (3000). Could you try to also adjust mindistltr to account for that and to prevent overlapping LTRs? In this case it should be at least 3000. If that helps, then maybe LTRharvest should check this condition at the start.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
从腾讯企业邮箱发来的超大附件
Ps_genome.part-05.fasta.gz (586.6M, 2022年02月03日 19:25 到期)进入下载页面:http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=353934376f586fe23f1621ab45660e0b4a0b555656550b5b561403545153140d550b061a5b52095c480a5654525408005d01565306662339354a6b50070856540017445610121409501752561112581702433409&code=e947bf99&fid=72/2aa432b3-7c35-4022-940e-3bc021988bdd
Thanks, I downloaded the file and will try to reproduce the issue. LTRharvest is running quite long... have you masked all short and tandem repeats before running LTRharvest? Otherwise the seed hits will explode, unnecessarily blowing up the run time.
I have not marked any short and tandem repeats before running LTRharvest. The error was reported after two weeks of operation. It is also difficult to annotate tandem repeats due to their length.
Ouch, I see. Two weeks -- LTRharvest definitely should never run that long! I would strongly advise to at least use RepeatMasker to mask low-complexity repeats in the source. It is not recommended to just run LTRharvest on the raw sequence if there are many and long instances of such repeats. With the default seed size of 30 these will lead to lots of potential candidate pairs to be evaluated, which will excessively inflate the run time. You likely need to prepare the input sequence a bit.
My suggestion:
N
)mindistltr
to be at least your maxlenltr
or do multiple runs with different settings.Regarding the original error: I am afraid I will not be able to run the software for two weeks each time I need to reproduce the error as I don't have a compute farm at my disposal any more. Is there any way you could come up with a smaller sequence stretch that triggers the issue?
Problem description
While using LTRharvest this error pops up:
Exact command line call triggering the problem
After creating the index, submit the following command:
Example minimal input triggering the problem
What GenomeTools version are you reporting an issue for (as output by
gt -version
)?gt (GenomeTools) 1.6.2 Copyright (c) 2003-2016 G. Gremme, S. Steinbiss, S. Kurtz, and CONTRIBUTORS Copyright (c) 2003-2016 Center for Bioinformatics, University of Hamburg See LICENSE file or http://genometools.org/license.html for license details.
Used compiler: cc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5) Compile flags: -g -Wall -Wunused-parameter -pipe -fPIC -Wpointer-arith -Wno-unknown-pragmas -O3 -Werror
Did you compile GenomeTools from source? If so, please state the
make
parameters used.What operating system (e.g. Ubuntu, Mac OS X), OS version (e.g. 15.10, 10.11) and platform (e.g. x86_64) are you using?
CentOS Linux 8 (Core)