Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
187 stars 22 forks source link

Running 300h, 24cores, 40Mb genome and it doesn't finish round-4 #39

Open calix015 opened 5 years ago

calix015 commented 5 years ago

Hi, I've been trying to softmask a 40Mb algal genome (53 contigs) for a subsequent annotation.

First, I tried using funannotate mask (a wrapper for Repeatmodeler and RepeatMasker), it run 300h with 24cores and it didn't finish round-4, it only got to 5% of all by all comparisons of this round with an estimated 961h remaining, although the Input Database Coverage was 100.00 % (so I guess that would be the last round?). Then, I tried using RepeatModeler by itself, it run 72h and it didn't finish round-1. Given the runtimes that you mention on your web I'm guessing this is not normal and something could be done... so I'm hoping you could help me troubleshoot the problem (please!!).

This are the commands i used for RepeatModeler (in case there is a mistake):

module load repeatmodeler/1.0.11 module load repeatmasker/4.0.5 module load perl/modules.centos7.5.26.1 module load perl

BuildDatabase -name pabb_repeats -engine ncbi PABB004_ordered_genome.fasta RepeatModeler -engine ncbi -pa 23 -database pabb_repeats buildRMLibFromEMBL.pl /panfs/roc/msisoft/repeatmasker/4.0.5/Libraries/RepeatMaskerLib.embl > RepeatMaskerLib.fasta cat RepeatMaskerLib.fasta consensi.fa.classified > combined_repeat_libs.fasta

Also, attached are the core.#### files that RepeatModeler generated while running. core files.zip

rmhubley commented 5 years ago

That is a very old version of RepeatMasker. We are up to 4.0.9-p2 right now. Please upgrade that first. I will have the next release of RepeatModeler check that it's got an update-to-date version of RepeatMasker before starting. I am surprised that you had to build the "RepeatMaskerLib.fasta" by hand each time. This is usually taken care of by the installation of RepeatMasker itself. Must be a quirk of your "module load" environment? In any case, you will want that file to be stored in the RepeatMasker/Libraries directory rather than your current working directory.

Finally, the core files are not as helpful as the program screen output. If you kept a log of the run and/or the RM_* directory created during the run that would really help.

pengbo233 commented 5 years ago

hi, shell likes:

RepeatMasker-open-4-0-9/RepeatMasker/RepeatProteinMask -noLowSimple -pvalue 0.0001 *.fasta

it cost more than 521.6h ,Neither repeatmasker nor repeatmodel are suited to deal with large genomes,they often runs for nearly 20 days and always break off. It wasted much time and sources. Please provide a solution