How long does it take RepeatModeler to complete the analysis? - Githubissues

Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool

Other

182 stars 23 forks source link

How long does it take RepeatModeler to complete the analysis? #229

Open SowmyaPulapet opened 7 months ago

SowmyaPulapet commented 7 months ago

What do you want to know?

How long does it take RepeatModeler to complete the analysis?

Helpful context

I am running RepeatModeler on my de novo assembly of a grasshopper species. The assembly length is around ~900 MB. The process has been running for more than 2 weeks. Below is the command I used for analysis:

~/Tools/RepeatModeler/RepeatModeler -database sample_DB -threads 18

I am a bit confused as to why this is not complete yet. I also have a RAM capacity of ~126 GB.

I am requesting your insights on this. Please let me know if I have to change any parameters that would make this run faster.

Thank you.

rmhubley commented 2 weeks ago

Really late to answer, but it may be helpful to others. The answer depends on the repetitive content of the genome. Ideally you would be running RepeatModeler on a genome with >40% evenly distributed repetitive content. The lower the repetitive content the longer all-by-all comparisons take to complete. Also, it's important to not that there is nothing sacred about 5 or 6 rounds (default vs -quick). If you are noticing that the round to round masking is not increasing, you can force the tool to quit and use the results generated up to that point stored in the RM#/consensi.fa and RM#/families.stk files. The only step(s) not run til the end are RepeatClassifier and LTRPipeline (if requested). RepeatClassifier may be run on these intermediate files like so:

% ./RepeatClassifier -consensi RM_#####/consensi.fa -stockholm RM_#####/families.stk