Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

Error in RepeatClassifier step: Missing RepeatPeps.lib.psq #254

Open athenasyarifa opened 2 months ago

athenasyarifa commented 2 months ago

Hi @rmhubley and everyone,

I finished several rounds of RepeatModeler, but I have an error in the RepeatClassifier step. It seems I am missing several required files for RepeatClassifier: RepeatPeps.lib.psq, RepeatMasker.lib, RepeatMasker.lib.nsq. I tried running again configure in RepeatMasker directory with successful installation of Dfam library according to the instruction in the website. See below the final part of the configuration:

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Configured, Default ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 5
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:

FamDB Directory     : /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/tools/RepeatMasker/Libraries/famdb
FamDB Generator     : famdb.py v1.0
FamDB Format Version: 1.0
FamDB Creation Date : 2023-11-15 11:30:15.311827

Database: Dfam
Version : 3.8
Date    : 2023-11-14

Dfam - A database of transposable element (TE) sequence alignments and HMMs.

1 Partitions Present
Total consensus sequences present: 295552
Total HMMs present               : 295552

Partition Details
-----------------
 Partition 0 [dfam38_full.0.h5]: root - Mammalia, Amoebozoa, Bacteria <bacteria>, Choanoflagellata, Rhodophyta, Haptista, Metamonada, Fungi, Sar, Placozoa, Ctenophora <comb jellies>, Filasterea, Spiralia, Discoba, Cnidaria, Porifera, Viruses
     Consensi: 295552, HMMs: 295552

 Partition 1 [ Absent ]: Obtectomera 

 Partition 2 [ Absent ]: Euteleosteomorpha 

 Partition 3 [ Absent ]: Sarcopterygii - Sauropsida, Coelacanthimorpha, Amphibia, Dipnomorpha

 Partition 4 [ Absent ]: Diptera 

 Partition 5 [ Absent ]: Viridiplantae 

 Partition 6 [ Absent ]: Deuterostomia - Chondrichthyes, Hemichordata, Cladistia, Holostei, Tunicata, Cephalochordata, Cyclostomata <vertebrates>, Osteoglossocephala, Otomorpha, Elopocephalai, Echinodermata, Chondrostei

 Partition 7 [ Absent ]: Hymenoptera 

 Partition 8 [ Absent ]: Ecdysozoa - Nematoda, Gelechioidea, Yponomeutoidea, Incurvarioidea, Chelicerata, Collembola, Polyneoptera, Tineoidea, Apoditrysia, Monocondylia, Strepsiptera, Palaeoptera, Neuropterida, Crustacea, Coleoptera, Siphonaptera, Trichoptera, Paraneoptera, Myriapoda, Scalidophora

Further documentation on the program may be found here:
  /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/tools/RepeatMasker/repeatmasker.help

But after that, there is no RepeatMasker.lib files in the RepeatMasker/Libraries/ directory.

I ran RepeatModeler v.2.0.5 with installed dependencies: rmblast 2.14.1+, TRF 4.09, RECON, RepeatScout 1.0.6, RepeatMasker 4.1.6 along with the Dfam libraries installed according to the website instructions. I attached below the log file of RepeatModeler run:

00_repeatmodeler.log

I run the following command to run RepeatModeler:

${RepeatModelerDIR}/BuildDatabase -name poeMon1 poeMon1.fa
${RepeatModelerDIR}/RepeatModeler -LTRStruct -threads 4 -database poeMon1 2>&1 | tee 00_repeatmodeler.log

This is what inside my RepeatMasker/Libraries directory:

$ ls RepeatMasker/Libraries/
Artefacts.embl  famdb  README.meta  RepeatAnnotationData.pm  RepeatPeps.lib  RepeatPeps.readme  RMRBMeta.embl  RMRB_spec_to_tax.json  taxonomy.dat

$ ls RepeatMasker/Libraries/famdb
dfam38_full.0.h5  rmlib.config

I am running on a Linux machine 4.12.14-197.108-default (this is the login node).

I saw some other issues with missing RepeatMasker.lib such as #137 but they have the RepeatPeps.lib and issue #128 about how to generate my own RepeatMasker.lib but from my understanding the configuration step should automatically make this RepeatMasker.lib file from the downloaded Dfam libraries. Did I do something wrong?

Any help would be much appreciated. Thank you so much! Best, Rifa

malvaradol commented 2 weeks ago

Hi @athenasyarifa

I'm currently having the same issue, and after multiple attempts I was able to generate the RepeatMasker.lib, here's what I did:

I'm still waiting for the output of RepeatModeler, as when RepeatMasker.lib did not exist the program did not execute RepeatClassifier for the exact same reasons that you are pointing out. I will let you know how it goes.

Hope this helps in any way.