Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
182 stars 23 forks source link

how to finish an interrupted RepeatModeler run after manually finishing LTR_retriever #233

Closed fanhuan closed 4 months ago

fanhuan commented 5 months ago

What do you want to know?

I was running RepetModeler with -LTRStruct. and it failed at the LTR Structural Analysis part. When I tried to resume the program by using -recoverDir, it says that my RepeatModeler run is successful. I was able to locate the log for LTR_retriever (for me it was: RM_14.FriJan50358422024/LTR346857.TueJan90142312024/LRET***/LTR_retriever.log) and in the log it has the command that was run.

Parameters: -repeatmasker /opt/RepeatMasker -blastplus /opt/rmblast/bin -cdhit_path /opt/cd-hit -trf_path /opt/trf -genome seq.fa -inharvest /opt /RM_14.FriJan50358422024/LTR_346857.TueJan90142312024/raw-struct-results.txt -noanno -threads 20

I located my LTR_retriever and restart it:

/opt/LTR_retriever/LTR_retriever -repeatmasker /opt/RepeatMasker -blastplus /opt/rmblast/bin -cdhit_path /opt/cd-hit -trf_path /opt/trf -genome seq.fa -inharvest /opt/RM_14.FriJan50358422024/LTR_346857.TueJan90142312024/raw-struct-results.txt -noanno -threads 20

I think it was able to recognize what was done. The previous run stopped at Module 1 (after running for 12h), but this time it was able to move on the modules 2-5 in less than an hour. This finished and I got results like:

  ##############################
  ####### Result files #########
  ##############################

  Table output for intact LTR-RTs (detailed info)
          seq.fa.pass.list (All LTR-RTs)
          seq.fa.nmtf.pass.list (Non-TGCA LTR-RTs)
          seq.fa.pass.list.gff3 (GFF3 format for intact LTR-RTs)

  LTR-RT library
          seq.fa.LTRlib.redundant.fa (All LTR-RTs with redundancy)
          seq.fa.LTRlib.fa (All non-redundant LTR-RTs)
          seq.fa.nmtf.LTRlib.fa (Non-TGCA LTR-RTs)

According to the RepetModeler website, I believe the correct result for a successful RepeatModeler run with the -LTRStruct option should result in things like this instead:

      At the succesful completion of a run, three files are generated:

      <database_name>-families.fa  : Consensus sequences
      <database_name>-families.stk : Seed alignments
      <database_name>-rmod.log     : A summarized log of the run

However, after finishing the LTR_retriever, I still don't have -families.fa. I do have families.stk but not -families.stk. Same with rmod.log.

Please kindly let me know how I can obtain those files.

My environment:

How did you install RepeatModeler? docker from TE-tools (https://github.com/Dfam-consortium/TETools)

Which version of RepeatModeler do you have? RepeatModeler-2.0.5

Which version of RepeatMasker is this RepeatModeler installation using? 4.1.6

Operating system and version: Ubuntu 22.04

Helpful context

fanhuan commented 4 months ago

Some updates. I ended up re-run the whole pipeline. It did not take too long for my genome (~1.5Gb, 2-3 days). The key limit for speed (as far as I understand) is the IO speed of the disk you are running things at. SSD would be fine for example.