Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

LTR Pipeline hangs #109

Open hung-th opened 3 years ago

hung-th commented 3 years ago

Thank you for developing this super useful tool.

I was running RepeatModeler with this command: ~/bin/RepeatModeler-2.0.1/RepeatModeler -database Dacoc -pa 16 -LTRStruct -recoverDir RM_11385.MonNov160312422020/LTR_3823.WedNov181402552020 >& run.out &

The RepeatModeler successfully completed with 6 rounds. However it then hangs at Running LtrHarvest... for several days.

I then stopped the job and only ran the LTRPipeline with the following command: ~/bin/RepeatModeler-2.0.1/LTRPipeline ../Dacoc_1.0.fasta >& run_LTR.out

It seems to hang again.

ls -l LTR_1638.SunNov222107282020/LHAR_1638.SunNov222107282020/

gives

total 5579548
drwxr-xr-x 2 tin_hang_hung tin_hang_hung       4096 Nov 22 21:14 .
drwxr-xr-x 3 tin_hang_hung tin_hang_hung       4096 Nov 22 21:07 ..
-rw-r--r-- 1 tin_hang_hung tin_hang_hung       4366 Nov 22 21:07 esa_index.des
-rw-r--r-- 1 tin_hang_hung tin_hang_hung  146071480 Nov 22 21:07 esa_index.esq
-rw-r--r-- 1 tin_hang_hung tin_hang_hung  584284618 Nov 22 21:14 esa_index.lcp
-rw-r--r-- 1 tin_hang_hung tin_hang_hung  308763536 Nov 22 21:14 esa_index.llv
-rw-r--r-- 1 tin_hang_hung tin_hang_hung      14355 Nov 22 21:07 esa_index.md5
-rw-r--r-- 1 tin_hang_hung tin_hang_hung        479 Nov 22 21:14 esa_index.prj
-rw-r--r-- 1 tin_hang_hung tin_hang_hung       3472 Nov 22 21:07 esa_index.sds
-rw-r--r-- 1 tin_hang_hung tin_hang_hung       1744 Nov 22 21:07 esa_index.ssp
-rw-r--r-- 1 tin_hang_hung tin_hang_hung 4674276944 Nov 22 21:14 esa_index.suf
-rw-r--r-- 1 tin_hang_hung tin_hang_hung          0 Nov 22 21:14 ltrharvest.log
-rw-r--r-- 1 tin_hang_hung tin_hang_hung          0 Nov 22 21:14 ltrharvest.out
-rw-r--r-- 1 tin_hang_hung tin_hang_hung          0 Nov 22 21:07 suffixerator.log

I would be very grateful if you have any insight or advice that might troubleshoot this.

jebrosen commented 3 years ago

What is the size of this genome? Is it publically available or could be shared with us privately so that we could try to reproduce the error?

Did you check in some way if it was "stuck" vs actually taking a long time to run, for example by watching the program's CPU/memory usage? (LtrHarvest is part of the gt program).

JoseRPB commented 2 years ago

Hello, I have also the same problem. The size of the genome I am trying to run is 3.6 Gb and the average of repeats is about 50%. Repeatmodeler2 successfully complete the 6 rounds in three days but it remains hanging on:

LTR Structural Analysis

======================= Running LtrHarvest...

I tried running these three times but always hangs in the same way. I am working in a Slurm environment ( Storage Throughput = fair ( 341.45 MB/s ) with 1 Tb of Ram and 20 threads. I am running the software using the singularity.

I would be very grateful with any kind of help because I don´t know what is going on here.

mason-linscott commented 2 years ago

Hi all,

I also would like to say that I have had this issue since June of this year. I am running RM on a 5.4GB assembly with an estimated repeat content of 83% (RM estimate). The LTRPipeline generates the suffix and creates log files but all log files are empty. I have left it running for one month and there has been no progress. My genome does have a much larger LTR content than other species in its clade.

Possibly related, I have had success running LTRHarvest using EDTA. The process took 5 days on a standalone server with 48 threads and 1TB Ram.