Open munegowda opened 3 years ago
I'm unable to figure out why it takes an unusually long amount of time to run.
Why do you say unusually? RepeatModeler can take quite a long time to run, although it varies between assemblies and machines.
I did notice that your log file contains two runs, the first one stopped pretty early on. The second run shows much worse disk throughput, though:
Using output directory = /beegfs/scratch/tmp/cmunegowda/TEMP_HLnycPro4_modelAndMask/RM_3134750.WedMay121145292021 Storage Throughput = fair ( 697.47 MB/s ) (...) Using output directory = /beegfs/scratch/tmp/cmunegowda/TEMP_HLnycPro4_modelAndMask/RM_3309014.SatMay151153452021 Storage Throughput = poor ( 174.71 MB/s ) - NOTE: Poor storage througput will have a large impact on RepeatModeler performance. The low throughput observed above may be due to transient usage patterns on the system and may not reflect the actual system performance. Whenever possible run RepeatModeler in a directory stored on a fast local disk and not over a network filesytem.
RepeatModeler should indeed perform much better when run on a local filesystem instead of a network filesystem, if that is an option available to you.
Dear Jeb, thanks for your help. We were running an older RepeatModeler version (1.0.8) on a high-performance file system (lustre) and it typically finished within 1-2 days. Now have we have the latest RepeatModeler version and we run it on another high-performance file system (BeeGFS), which in principle should be even more performant than a local file system on a single disk. However, it doesn't finish after a week or so.
We will test running it on the local disk, but I am not sure that is the issue.
Can you think of something else why it doesn't finish in a reasonable amount of time (few days, not weeks)?
Thanks again Michael
It comes as a bit of surprise to me that a network filesystem could ever be faster than the local disk - but it seems I haven't noticed just how fast Ethernet speeds have gotten.
Although in this case, the observed performance of 174.71 MB/s in this particular run is pretty poor. On a few machines I tested, local disks were around 1700 MB/s and an NFS filesystem was around 100 MB/s.
I have recently installed RepeatModeler2 from https://www.repeatmasker.org/RepeatModeler/RepeatModeler-2.0.1.tar.gz and I do not get any errors when I run it. But even after running for more than a week with any genome assembly, it does not finish. For example, I'm using RepeatModeler2 with the racoon dog assembly (renamed as HLnycPro4) from NCBI: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/905/146/905/GCA_905146905.1_NYPRO_anot_genome/GCA_905146905.1_NYPRO_anot_genome_genomic.fna.gz and is run with the following command:
RepeatModeler -pa 8 -engine ncbi -database HLnycPro4 &>>log.model.txt
this has run for more than 6 days and has not yet finished. Here is the output log file log.model.txt. I'm unable to figure out why it takes an unusually long amount of time to run. I would be grateful to get any help in resolving this issue.Thanks