Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
183 stars 23 forks source link

Issue with the all-by-other comparisons in RepeatModeler Round 2 #115

Closed goodgodric28 closed 3 years ago

goodgodric28 commented 3 years ago

I am having an issue that appears to be a memory problem, but based on my reading of other issues and the documentation, other folks have not had such a problem. This is on one ~1.5 Gb genome with ~1100 scaffolds and an estimated 35% repeats (using GenomeScope). Any suggestions?

>     -- Input Database Coverage: 3020628 bp out of 1515217215 bp ( 0.20 % )
>     Sampling Time: 00:02:17 (hh:mm:ss) Elapsed Time
>     Running all-by-other comparisons...
>     2% completed, 00:19:59 (hh:mm:ss) est. time remaining.
>     4% completed, 00:10:08 (hh:mm:ss) est. time remaining.
>     7% completed, 00:7:17 (hh:mm:ss) est. time remaining.
>     sh: /bin/cat: Argument list too long
>     8% completed, 00:6:00 (hh:mm:ss) est. time remaining.
>     10% completed, 00:5:14 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 24 ) [ 0 ]...
>     WARNING: Retrying batch ( 25 ) [ 0 ]...
>     12% completed, 00:4:42 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 26 ) [ 0 ]...
>     14% completed, 00:4:29 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 27 ) [ 0 ]...
>     WARNING: Retrying batch ( 28 ) [ 0 ]...
>     WARNING: Retrying batch ( 24 ) [ 0 ]...
>     sh: line 1: 129357 Segmentation fault /home/sluglife/programs/rmblast-2.10.0/bin/blastdbcmd -db /oasis/projects/nsf/ddp370/sluglife/berghia_genome/dovetail_hic_Aug2020/Berghia_Aug2020_purgedups/repeatmodeler/RM_40182.MonJan251233542021/round-2/sampleDB-2.fa.masked -entry "gi|31" >> /oasis/projects/nsf/ddp370/sluglife/berghia_genome/dovetail_hic_Aug2020/Berghia_Aug2020_purgedups/repeatmodeler/RM_40182.MonJan251233542021/round-2/batch-30.fa 2>> /oasis/projects/nsf/ddp370/sluglife/berghia_genome/dovetail_hic_Aug2020/Berghia_Aug2020_purgedups/repeatmodeler/RM_40182.MonJan251233542021/round-2/blastdbcmd.log
>     WARNING: Retrying batch ( 25 ) [ 0 ]...
>     WARNING: Retrying batch ( 29 ) [ 0 ]...
>     17% completed, 00:4:32 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 26 ) [ 0 ]...
>     WARNING: Retrying batch ( 30 ) [ 0 ]...
>     18% completed, 00:4:17 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 27 ) [ 0 ]...
>     21% completed, 00:4:00 (hh:mm:ss) est. time remaining.
>     WARNING: Retrying batch ( 28 ) [ 0 ]...
>     23% completed, 00:3:47 (hh:mm:ss) est. time remaining.
> 
>     FATAL ERROR: RepeatModeler giving up. One or more
>     batches failed! Unfortunately this type of error
>     cannot be recovered from. Please submit the following
>     details to the feedback page at the repeatmasker
> 
>     RepeatModeler Version: 2.0.1
>     Search Engine: rmblast [ 2.10.0+ ]
>     Command Line: /home/sluglife/programs/RepeatModeler/RepeatModeler-database Berghia_Aug2020_purged -pa 24 -LTRStruct
>     Batch Number: 24
>     Disk Space:
> 
>     System Memory:
>     MemTotal: 1585293480 kB
>     MemFree: 1294664108 kB
>     MemAvailable: 1298156236 kB
>     Cached: 5941428 kB
>     SwapCached: 0 kB
>     SwapTotal: 0 kB
>     SwapFree: 0 kB
>     Further details about this problem may be found in
>     the directory: /oasis/projects/nsf/ddp370/sluglife/berghia_genome/dovetail_hic_Aug2020/Berghia_Aug2020_purgedups/repeatmodeler/RM_40182.MonJan251233542021
> 
>     slurmstepd: Exceeded step memory limit at some point.
>     slurmstepd: Exceeded job memory limit at some point
jebrosen commented 3 years ago
slurmstepd: Exceeded step memory limit at some point.
slurmstepd: Exceeded job memory limit at some point

What are the memory limits you have on this cluster? It may simply be too low a limit to run RepeatModeler (although the actual machine seems to have plenty of memory), or you may need to use a lower -pa value. For the RMBlast search engine RepeatModeler uses 4 cores per batch, so you would have 96 simultaneous RMBlast threads given that command.

goodgodric28 commented 3 years ago

I modified -pa and the resources I was requesting and it fixed the problem. Thank you!