Closed fanhuan closed 4 months ago
Some updates. I ended up re-run the whole pipeline. It did not take too long for my genome (~1.5Gb, 2-3 days). The key limit for speed (as far as I understand) is the IO speed of the disk you are running things at. SSD would be fine for example.
What do you want to know?
I was running
RepetModeler
with-LTRStruct
. and it failed at the LTR Structural Analysis part. When I tried to resume the program by using-recoverDir
, it says that my RepeatModeler run is successful. I was able to locate the log for LTR_retriever (for me it was: RM_14.FriJan50358422024/LTR346857.TueJan90142312024/LRET***/LTR_retriever.log) and in the log it has the command that was run.Parameters: -repeatmasker /opt/RepeatMasker -blastplus /opt/rmblast/bin -cdhit_path /opt/cd-hit -trf_path /opt/trf -genome seq.fa -inharvest /opt /RM_14.FriJan50358422024/LTR_346857.TueJan90142312024/raw-struct-results.txt -noanno -threads 20
I located my LTR_retriever and restart it:
/opt/LTR_retriever/LTR_retriever -repeatmasker /opt/RepeatMasker -blastplus /opt/rmblast/bin -cdhit_path /opt/cd-hit -trf_path /opt/trf -genome seq.fa -inharvest /opt/RM_14.FriJan50358422024/LTR_346857.TueJan90142312024/raw-struct-results.txt -noanno -threads 20
I think it was able to recognize what was done. The previous run stopped at Module 1 (after running for 12h), but this time it was able to move on the modules 2-5 in less than an hour. This finished and I got results like:
According to the RepetModeler website, I believe the correct result for a successful RepeatModeler run with the -LTRStruct option should result in things like this instead:
However, after finishing the LTR_retriever, I still don't have-families.fa. I do have families.stk but not -families.stk. Same with rmod.log.
Please kindly let me know how I can obtain those files.
My environment:
How did you install RepeatModeler? docker from TE-tools (https://github.com/Dfam-consortium/TETools)
Which version of RepeatModeler do you have? RepeatModeler-2.0.5
Which version of RepeatMasker is this RepeatModeler installation using? 4.1.6
Operating system and version: Ubuntu 22.04
Helpful context
Is there a particular genome assembly or organism your question is about? If possible, please provide a link to a publicly available assembly and/or a species name.
No. this is an in-house assembly.
Have you installed RepBase RepeatMasker Edition for RepeatMasker? This question is especially relevant for questions about classification or the
RepeatClassifier
program.I am not sure. I am using the docker image from TE-tools. But I don't think my question is super-relevant to RepeatClassifier?