Closed helltish closed 1 year ago
NOTE: RepeatScout did not return any models.
Exactly same issue here. Manually installed RepeatModeler worked well on the same assembly.
I also noticed that nseg
which is needed for RepeatScout was not installed.
Manually compiling and locating at bin
directory didn't help.
<prefix>.fa.rscons.filtered
file which is one of RepeatScout outputs was empty, too. So I assume that RepeatScout has failed at some point.
coda Repeatmodeler package still not working properly
I believe the underlying issue is here: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/repeatmodeler/build.sh#L23
The RepeatScout package in bioconda does not include filter-stage-1.prl
in the bin
directory, but that program is used by RepeatModeler.
I'm working on it with this PR #19088. I guess it could work like it is in this PR (without nseg) but to be perfect it would be nice to include nseg. But there is currently no recipe for it. nseg is available here ftp://ftp.ncbi.nih.gov/pub/seg/nseg.
In the build3 of RepeatScout nseg and filter-stage-1.prl are now included, but the result seems still the same. How can we check manually which step is failing? Maybe one of this tool do not behaves like they should
I believe at this point the remaining bug is in the RepModelConfig.pm
and/or the RepeatModeler
wrapper script shipped in bioconda: it sets $TRF_PRGM = $ENV{'TRF_DIR'};
which is something like .../conda/bin
, but TRF_PRGM
is supposed to be the path to the trf
binary itself (.../conda/bin/trf
) - the same goes for NSEG_PRGM
. Either RepModelConfig.pm
or the wrapper should be modified to set the correct paths.
There could be other problems in the custom RepModelConfig.pm
that I haven't noticed, but those two are directly causing this particular issue.
Thank you for your help, I succeeded to fix that today. Now this step is working fine. I have now a problem later in the execution:
-- Refining Family R=207 / 0 ( RS Elements: 2212, Using 100 ):
RepeatModeler: Could not open refined model /scratch/jacda119/RM_19538.FriDec61556042019/round-1/family-0.fa.refiner_cons!
@Juke34 Does that happen for every model or only some of them? I can try to reproduce that in a clean environment... probably next week.
I found! It is because the PATH to RepeatClassifier, Refiner and TRFMask are wrong. They are called directly in the RepeatModeler folder (in share), that cause an error e.g: '-bash: ./TRFMask: /u1/local/bin/perl: bad interpreter: No such file or directory'
while they have to be called as the other tools by the bin folder where there is a wrapper to call them like that: perl path/to/share/RepeatModeler/TRFMask options
But I'm getting close, I found a (nasty) way to fix the problem by modifying the code during installation with a sed command. I'm trying locally and it seems to work ... will see if I face up another problem
./TRFMask: /u1/local/bin/perl: bad interpreter: No such file or directory
Yes... the configure
script fixes all of those perl
lines but bioconda does not use it. Hopefully with the newest version of RepeatModeler where configure
supports command line arguments and does not need to be run interactively, bioconda can use configure
instead.
It run until the end now. Except at the very end I have this message
Missing /home/jacda119/anaconda3/envs/repeatmodeler/share/RepeatMasker/Libraries/RepeatPeps.lib.psq!
Please rerun the configure program in the RepeatModeler directory
before running this script.
- Looking for similarity to known repeat proteins..
Classification Time: 00:00:26 (hh:mm:ss) Elapsed Time
Program Time: 12:19:15 (hh:mm:ss) Elapsed Time
Working directory: /scratch/jacda119/RM_33119.FriDec62053432019
may be deleted unless there were problems with the run.
The results have been saved to:
[...]
Is this RepeatPeps.lib.psq important? What it is use for? Can we skip this step? If we need this file how can we include it in the package? I don't know from where it is supposed to come
I think that message is wrong - it should say to re-run configure
in the RepeatMasker directory, which bioconda also does not do. This should only affect the classification step, but it's a pretty big part of it.
That means I need to fix the repeatmasker recipe too to create this file. Could you tell me the steps needed to create this file without using the configure? There is a file called RepeatPeps.lib so I guess there is one step to create the RepeatPeps.lib.psq from it.
It used to be done by the package - https://github.com/bioconda/bioconda-recipes/blob/master/recipes/repeatmasker/build.sh#L22. Like RepeatModeler, the latest version of RepeatMasker has a configure
script that can be run non-interactively so that should be easier in the future.
Actually is just a file made by makeblastdb command. No need to touch the RepeatMasker recipe for that, I can fix it directly from the RepeatModeler recipe then.
The recipe is now fixed in repeatmodeler-1.0.11 build pl526_2 (see #19137).
The only thing remaining is the last step (RepeatClassifier) that use RepeatMasker that will be skipped by default. You will get this message:
Missing ${CONDA_PREFIX}/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq!
This is because no nucleotide repeat library is included in RepeatMasker. So it is recommended to download the DB of your choice (get licence to use RepBase) to get this working properly.
cp RepeatDB.fna ${CONDA_PREFIX}/share/RepeatMasker/Libraries/RepeatMasker.lib
makeblastdb -dbtype nucl -in ${CONDA_PREFIX}/share/RepeatMasker/Libraries/RepeatMasker.lib
@helltish you can close the issue now.
Hello,
I just got this bug
RepeatClassifier Version 2.0.1
======================================
Search Engine = rmblast
- Looking for Simple and Low Complexity sequences..
- Looking for similarity to known repeat proteins..
Missing /home/cbfgws6/Programs/rpmskr/RepeatMasker//Libraries/RepeatPeps.lib.psq!
Please rerun the configure program in the RepeatModeler directory
before running this script.
How exactly should I fix it? Should I run this command:
$ makeblastdb -dbtype nucl -in /home/cbfgws6/Programs/rpmskr/RepeatMasker/Libraries/RepeatMasker.lib
So, running the above command generated the following three files:
RepeatMasker.lib.nsq
RepeatMasker.lib.nin
RepeatMasker.lib.nhr
And then restarting using the command (-recoverDir) results in this:
This directory ( /home/cbfgws6/Programs/rpmddlr/RepeatModeler/RM_9706.ThuSep171738172020 )
appears to contain a successful run of RepeatModeler. If this
is not the case, please report this as a bug at the RepeatMasker
website ( www.repeatmasker.org )
So...am I good?
Do conda list
and check the following:
What version did you use?
What is the RepeatMasker version installed? (conda list)
The RepeatMasker.lib should be there $CONDA_PREFIX/share/RepeatMasker/Libraries/
Hello, I had the exact same issue.
What I had to do is run ./configure
again. I think configure includes downloading the database and make a blast database.
cd $(dirname $(which RepeatMasker))/../share/RepeatMasker
# ./configure downloads required databases
echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
It should look like this
ls $(dirname $(which RepeatMasker))/../share/RepeatMasker/Libraries
# Artefacts.embl Dfam.hmm RepeatAnnotationData.pm RepeatMasker.lib.nin RepeatPeps.lib RepeatPeps.lib.psq
# CONS-Dfam_3.0 README.meta RepeatMasker.lib RepeatMasker.lib.nsq RepeatPeps.lib.phr RepeatPeps.readme
# Dfam.embl RMRBMeta.embl RepeatMasker.lib.nhr RepeatMaskerLib.embl RepeatPeps.lib.pin taxonomy.dat
Hope it helps. (I'm using RepeatMasker 2.0.1)
repeatmasker 4.0.9_p2 pl526_2 bioconda repeatmodeler 2.0.1 pl526_0 bioconda repeatscout 1.0.6 h516909a_1 bioconda
I checked repeatmasker 4.0.9_p2 and 4.1.0 and indeed the RepeatMaskerLib db in not set properly (on OSX at least)...
-rw-rw-r-- 2 jacda119 wheel 18755326 Sep 15 21:14 RMRBMeta.embl
-rw-rw-r-- 2 jacda119 wheel 113343436 Sep 15 21:14 taxonomy.dat
-rw-rw-r-- 2 jacda119 wheel 5550 Sep 15 21:15 RepeatPeps.readme
-rw-rw-r-- 2 jacda119 wheel 17979984 Sep 15 21:15 RepeatPeps.lib
-rwxrwxr-x 2 jacda119 wheel 22475384 Sep 15 21:15 RepeatAnnotationData.pm
-rw-rw-r-- 2 jacda119 wheel 214 Sep 15 21:15 README.meta
-rw-rw-r-- 2 jacda119 wheel 1869701327 Sep 15 21:15 Dfam.hmm
-rw-rw-r-- 2 jacda119 wheel 24005361 Sep 15 21:15 Dfam.embl
-rwxrwxr-x 2 jacda119 wheel 25283 Sep 15 21:15 Artefacts.embl
-rw-rw-r-- 2 jacda119 wheel 22661790 Sep 15 21:15 RepeatMaskerLib.embl
-rw-rw-r-- 2 jacda119 wheel 0 Sep 15 21:15 RepeatMasker.lib
-rw-rw-r-- 2 jacda119 wheel 16168295 Sep 15 21:15 RepeatPeps.lib.psq
-rw-rw-r-- 2 jacda119 wheel 2931407 Sep 15 21:15 RepeatPeps.lib.phr
-rw-rw-r-- 1 jacda119 wheel 144448 Sep 21 22:14 RepeatPeps.lib.pin
For RepeatMasker version 4.0.9_p2 the easiest would be to do
makeblastdb -dbtype nucl -in $CONDA_PREFIX/share/RepeatMasker/Libraries/RepeatMasker.lib
. This line should be added in the build.sh
if we want to fix this version of the recipe.
For version 4.1.0 we use the following command:
perl ./configure -libdir ${RM_DIR}/Libraries -trf_prgm ${PREFIX}/bin/trf -rmblast_dir ${PREFIX}/bin/ -hmmer_dir ${PREFIX}/bin -abblast_dir ${PREFIX}/bin -crossmatch_dir ${PREFIX}/bin
@jebrosen is there any reason why the RepeatMasker.lib db is not set properly with the configure? It
The last few comments look like multiple issues that may or may not be related to each other. This is the current state of affairs for these files, to the best of my knowledge:
RepeatPeps.lib
is shipped with RepeatMasker
(it's used in RepeatProteinMask
)RepeatMasker.lib
is generated by RepeatMasker's configure
script from the installed libraries: Dfam, plus RepBase RepeatMasker Edition if it has been added. In RepeatMasker famdb.py
tool, which has a dependency on python-h5py
(or however it is named in the bioconda repositories). RepeatMasker.lib
is used by RepeatModeler
.configure
runs makeblastdb
to generate BLAST format DBs for both of those files.I am not sure why RepeatMasker.lib
is 0 bytes long. Maybe there was an error in the environment or dependency setup; is there a way to access the build logs for the latest version of the repeatmasker
package?
Here for the build, under building and testing:
https://app.circleci.com/pipelines/github/bioconda/bioconda-recipes/32719/workflows/f3bcafb0-dca9-4564-935d-a675ab423d2f/jobs/123514
I can see
19:15:26 BIOCONDA INFO (OUT) RepeatMasker Configuration Program
19:15:26 BIOCONDA INFO (OUT) Rebuilding RepeatMaskerLib.embl master library
19:15:26 BIOCONDA INFO (OUT) Reading Artefacts.embl database...
19:15:26 BIOCONDA INFO (OUT) - Read in 9 sequences from $PREFIX/share/RepeatMasker/Libraries/Artefacts.embl
19:15:28 BIOCONDA INFO (OUT) Reading Dfam.embl database...
19:15:28 BIOCONDA INFO (OUT) - Read in 6915 sequences from $PREFIX/share/RepeatMasker/Libraries/Dfam.embl
19:15:29 BIOCONDA INFO (OUT) Saving RepeatMaskerLib.embl library...
19:15:29 BIOCONDA INFO (OUT) RepeatMaskerLib.embl: 6924 total sequences.
19:15:29 BIOCONDA INFO (OUT) Building FASTA version...Building RMBlast frozen libraries..
19:15:32 BIOCONDA INFO (OUT) The program is installed with a the following repeat libraries:
19:15:32 BIOCONDA INFO (OUT) Dfam database version Dfam_3.1
19:15:32 BIOCONDA INFO (OUT) RepeatMasker Combined Database: Dfam-Dfam_3.1
19:15:32 BIOCONDA INFO (OUT) Further documentation on the program may be found here:
19:15:32 BIOCONDA INFO (OUT) $PREFIX/share/RepeatMasker/repeatmasker.help
Hello,
This is not a conda install, but rather a "traditional" install on Ubuntu 18.04 LTS. RepeatModeller is 2.01; RepeatMasker is 4.1.1.; I installed RepeatMasker first (and dependencies), and then RepeatModeller (and dependencies). I installed all recommended versions of dependencies and added appropriate changes to my $PATH in <.bashrc>; I then ran RepeatModeller - without re-running the RepeatMasker <./configure> script. I got the error at the end of the RepeatModeller run.
I have now re-run the RepeatMasker <./configure> script and looks as if I've generated the missing files that RepeatModeller was complaining about and caused RepeatModeller to give that error about the missing file at the end of the run. I've started the process over again (63 hours), and hope it completes properly this time around.
@Juke34 It seems I misunderstood; I thought bioconda had already updated to the very latest version of RepeatMasker (4.1.1). I see now your PR only updated to 4.1.0. RepeatMasker 4.1.1 is more verbose about errors that happen in the failed step Building FASTA libraries
than 4.1.0. I will try to replicate troubleshoot the build failure locally and see what's going on there.
@cement-head That is unexpected; it should be enough to run each configure script only one time at installation. If you have any way to replicate it, or any logs or output from the first time the configure script failed, please report it as an issue to https://github.com/rmhubley/RepeatMasker (since it's not a problem with the bioconda recipe).
Okay, will do - I'll look and if the logs are there, I will file them. Thx.
Hello, I had the exact same issue. What I had to do is run
./configure
again. I think configure includes downloading the database and make a blast database.cd $(dirname $(which RepeatMasker))/../share/RepeatMasker # ./configure downloads required databases echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
It should look like this
ls $(dirname $(which RepeatMasker))/../share/RepeatMasker/Libraries # Artefacts.embl Dfam.hmm RepeatAnnotationData.pm RepeatMasker.lib.nin RepeatPeps.lib RepeatPeps.lib.psq # CONS-Dfam_3.0 README.meta RepeatMasker.lib RepeatMasker.lib.nsq RepeatPeps.lib.phr RepeatPeps.readme # Dfam.embl RMRBMeta.embl RepeatMasker.lib.nhr RepeatMaskerLib.embl RepeatPeps.lib.pin taxonomy.dat
Hope it helps. (I'm using RepeatMasker 2.0.1)
repeatmasker 4.0.9_p2 pl526_2 bioconda repeatmodeler 2.0.1 pl526_0 bioconda repeatscout 1.0.6 h516909a_1 bioconda
Yep, re-running <./configure> again for RepeatMasker, after installing RepeatModeller seems to be necessary; even though it should not be required. Will report as bug for RepeatMasker/RepeatModeller outside of this bioconda/conda repo.
@Juke34 This looks like the problem affecting the latest package, after I modified a few files to get better error output:
17:42:43 BIOCONDA INFO (ERR) sh: /opt/conda/conda-bld/repeatmasker_1600882772077/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/share/RepeatMasker/util/buildRMLibFromEMBL.pl: /opt/conda/conda-bld/repeatmasker_1600882772077/_h_env_placehold_placehold_pla: bad interpreter: No such file or directory
In RepeatMasker 4.1.0, buildRMLibFromEMBL.pl
was the program that generated RepeatMasker.lib
. configure
inserts the path to the running perl
interpreter into the shebang line of every script that comes with RepeatMasker
; it looks like it might have been cut off, breaking the path? That is a lot of placehold_
. I don't know if this is a problem with configure
, the path length, or something else. I also don't know if it's only a problem during build.sh
, or maybe even after installation.
RepeatMasker 4.1.1 uses a different library format and a different program (famdb.py
) to generate RepeatMasker.lib
, so I expect it to either work completely fine or fail for a different reason on this same step. For that reason, I think it would be more constructive to update than to try to fix this version of the package.
I am still getting this error Missing /home1/miniconda3/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq! Please rerun the configure program in the RepeatModeler directory before running this script.
when trying to generate the consensi.fa.classified file Using RepeatClassifier -consensi consensi.fa
I don't quite follow all the above discussion on how to fix this. Where in the conda bin directory should the .lib.nsq file be placed? Thanks for any help.
Nathan
Hello, I had the exact same issue. What I had to do is run
./configure
again. I think configure includes downloading the database and make a blast database.cd $(dirname $(which RepeatMasker))/../share/RepeatMasker # ./configure downloads required databases echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
It should look like this
ls $(dirname $(which RepeatMasker))/../share/RepeatMasker/Libraries # Artefacts.embl Dfam.hmm RepeatAnnotationData.pm RepeatMasker.lib.nin RepeatPeps.lib RepeatPeps.lib.psq # CONS-Dfam_3.0 README.meta RepeatMasker.lib RepeatMasker.lib.nsq RepeatPeps.lib.phr RepeatPeps.readme # Dfam.embl RMRBMeta.embl RepeatMasker.lib.nhr RepeatMaskerLib.embl RepeatPeps.lib.pin taxonomy.dat
Hope it helps. (I'm using RepeatMasker 2.0.1)
repeatmasker 4.0.9_p2 pl526_2 bioconda repeatmodeler 2.0.1 pl526_0 bioconda repeatscout 1.0.6 h516909a_1 bioconda
Can you please describe this line? I'm new to this.
echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
So i ran the above code you suggest in the miniconda share dir. cd $(dirname $(which RepeatMasker))/../share/RepeatMasker
echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
But the directory still only looks like this. Artefacts.embl README.meta RepeatMaskerLib.embl RepeatPeps.lib RepeatPeps.readme Dfam.embl RepeatAnnotationData.pm RepeatMasker.lib.ndb RepeatPeps.lib.pdb RMRBMeta.embl Dfam.hmm RepeatMasker.lib RepeatMasker.lib.ndb-lock RepeatPeps.lib.pdb-lock taxonomy.dat
I configured it to use HMMER3.1 however. Would this be the problem?
@mudithekanayake,
If you run ./configure
, the interactive configuration process will popup.
Five configurations required:
\n
2\n
$(dirname $(which rmblastn))\n
\n
5\n
For the first \n
, you type the return key to use the default value.
For the second 2\n
, you type 2 and the return key to choose 2.
... and so on.
Of course, you can do it in the interactive mode by setting one by one, but I was suggesting a kind of shortcut.
@insectnate I don't think that would be a problem. Try it out!
I did that and still get a Library directory of Artefacts.embl README.meta RepeatMaskerLib.embl RepeatPeps.lib RepeatPeps.readme Dfam.embl RepeatAnnotationData.pm RepeatMasker.lib.ndb RepeatPeps.lib.pdb RMRBMeta.embl Dfam.hmm RepeatMasker.lib RepeatMasker.lib.ndb-lock RepeatPeps.lib.pdb-lock taxonomy.dat
Can you do the ./configure
and configure the settings one by one?
I did the ./configure where I am prompted to confirm the $PATH to each of the dependencies. I have tried it both with specifying RMBlast and HMMER3.1 but never get the makeblastdb files shown above. Is there another way to do configure that I am missing where there are more settings?
1) I recommend creating a new environment and try it again.
2) Actually, if you look at the above comments, makeblastdb
does enough job for you.
makeblastdb -dbtype nucl -in ${CONDA_PREFIX}/share/RepeatMasker/Libraries/RepeatMasker.lib
Ok I will try that. By this do you mean deleting the conda install of RepeatMasker and installing again?
Thanks for all your help.
Nathan
Yes. Or, you could create another environment name.
Possibly related, the script queryRepeatDatabase.pl
seems not to run because FastaDB.pm
is not in $PERL5LIB
. (It's in .../share/RepeatMasker/
.)
If I manually add that to $PERL5LIB
, the script then fails with
No repeat libraries found! At a minimum Dfam.embl, Dfam.hmm
or RepBase RepeatMasker Edition is required to run. Please download
and install the latest Dfam libraries.
Died at /packages/miniconda/20190102/envs/rm-edta-mcurry-20201021/share/RepeatMasker/LibraryUtils.pm line 386.
Note also for the above comments, I'm pretty sure it's not kosher to modify files/dirs in the conda tree after install by conda. I think this breaks conda.
I had a same issue that RepeatMasker.lib and its blastdb were not provided by RepeatMasker configure. The solution I could was downloading the previous version-4.1.0 of RepeatMasker independently to conda then configured. Hope it works for you guys.
Now that #25163 is complete, I was able to download repeatmasker
4.1.1 from the bioconda repositories and RepeatModeler
successfully finished, including the classification step, on a test sequence file. I think that was the last remaining issue reported in this thread; hopefully this update is also working well for others.
There are still a few reasons one might prefer a manual installation, for example the LTR structural search method introduced RepeatModeler 2.0 requires some dependencies that are not yet in bioconda.
I recently had the same issue as described above, with the error message "missing /RepeatMasker/Libraries/RepeatPeps.lib.psq" when I tried to run RepeatClassifier. Just in case anyone else experiences this issue, I solved it by re-configuring RepeatMasker and manually downloading all the libraries from https://home.cc.umanitoba.ca/~psgendb/doc/BIRCH/doc/local/pkg/RepeatMasker/Libraries/ to the directory RepeatMasker/Libraries.
@JuliaLopezDelgado Please specify the version you have used. Because depending the version the problem is solved.
@JuliaLopezDelgado Please specify the version you have used. Because depending the version the problem is solved.
I am using RepeatMasker v.4.1.2 and RepeatModeler v.2.0.2
@JuliaLopezDelgado Bioconda had packages for RepeatMasker 4.1.1 (broken) and RepeatMasker 4.1.2-p1 (fixed), skipping over RepeatMasker 4.1.2 (FWIW, also fixed). Did you mean 4.1.2-p1?
The libraries available at home.cc.umanitoba.ca are over 10 years old. It is only by chance that they could even work today - and worse, old libraries might appear to work and silently fail or produce wrong/misleading results. For that reason I generally recommend against trying to "mix" files from new and old versions of RepeatMasker and RepeatModeler.
Dear developer Team,
we have compared the bioconda Repeatmodeler package with a local installation of Repeatmodeler (with all dependencies).
The local installation of Repeatmodeler produced an output, which is reasonable for our input file. The conda package does not produce any output. Please find our logs to compare. The input data was the same in both cases.
Thanks a lot in advance!