Open rdacemel opened 3 years ago
I'm having the same problem: Clustering...LTRPipeline: Error - could not cluster MAFFT results. : 00:00:13 (hh:mm:ss) Elapsed Time LTRPipeline : Error - could not open /path/RM_1419367.WedApr211045152021/LTR_2817894.ThuApr220802152021/clusters.dat! at /path/LTRPipeline line 325. I have a manual installation of version 2.0.1, which works just fine with another same size genome of a closely related species.
This message indicates a problem with running NINJA. Can you post the contents of the file Ninja.log
? It should be in the same directory as clusters.dat
in the error message.
ninja=/home/rdoming/scratch/programs/NINJA-0.95-cluster_only/NINJA
There is a newer version of NINJA (cluster_only) which included a few fixes for files of certain sizes; maybe this newer version would work? https://github.com/TravisWheelerLab/NINJA/tree/0.97-cluster_only
I can't find a Ninja.log in either the error run or the successful run. In the error run where clusters.dat should be there are three files: LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt I'm running again the genome with the git cluster-only Ninja, see if that's the problem.
In the error run where clusters.dat should be there are three files:
Sorry, this was probably my mistake. There should be another directory in there, NINJA_...
, that contains the Ninja.log
file
I don't see any Ninja folder or file under the RM_... output dir of either the ok run or the error run.
Sorry for the delayed response and thanks for the early reply and @ricardo-aaron for the follow-up. Like @ricardo-aaron I cannot find any Ninja log in the RM_... work directory.
Sorry again - I forgot about this snag: these particular processes generate many large temporary files, so they are deleted by default. The -debug
option is necessary to keep that particular directory around.
@ricardo-aaron, you should be able to run LTRPipeline -debug genome.fa
to run only the LTR structural steps while keeping the intermediate files, including Ninja.log
which should help explain the problem.
@rdacemel, I might have found the cause of your problem. Bioconda does not package NINJA, which should be fine because you installed it yourself and used -ninja_dir
. However, it does not look like RepeatModeler -ninja_dir=...
actually passes the ninja_dir
option along to LTRPipeline
(cc @rmhubley - and perhaps other scripts than LTRPipeline
are affected too?)
Instead, you can set the option via an environment variable, which is read directly by both RepeatModeler
and LTRPipeline
:
db=$1
pa=4
export NINJA_DIR=/home/rdoming/scratch/programs/NINJA-0.95-cluster_only/NINJA
# For troubleshooting; -debug will keep around Ninja.log and other files
LTRPipeline -debug genome.fa
# No need for -ninja_dir here anymore, since it was set above
RepeatModeler -LTRStruct -pa ${pa} -database ${db}
If using this environment variable instead of the command-line option does not end up solving the problem, the contents of Ninja.log
should still be able to help troubleshoot the issue further.
Hi @jebrosen ; I have encountered the same bug as this thread, but by exporting the NINJA_DIR
, the LTRPipeline was able to run.
May I ask if I therefore need to run RepeatModeler again (since it takes quite some time?) I have tried to use recover_dir flag, but it says the job has finished (although the mafft clustering didn't run); or I can somehow combine the outputs of LTRPipeline
and the partiall done RepeatModeler
outputs?
Thanks!
I have tried to use recover_dir flag, but it says the job has finished (although the mafft clustering didn't run); or I can somehow combine the outputs of LTRPipeline and the partiall done RepeatModeler outputs?
@saxovocal Yes, unfortunately -recoverDir
does not yet detect to re-run only the LTRPipeline and combining steps. One possible alternative to re-running all of RepeatModeler is renaming the file round-6/consensi.fa
to something else and then using -recoverDir
: this would still repeat some work, but at the beginning of round 6 instead of round 1.
Hi @jebrosen it looks like it might be an issue with the CPU I tried on a different machine and it ran like a charm, thanks
Hello!
I run into the same problem recently. I have followed the abovementioned steps to rerun RepeatModeler from round-6. And it worked well until NINJA error that apparently is related with de GLIBCXX
version of a given node or computer. Specifically, NINJA-0.95 needs this version: GLIBCXX_3.4.21
. Higher versions of NINJA failed for the same reason (at least in our case)
This command prints the available library versions on your system:
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX
We've been told two solutions: 1) upgrade Debian version (with root permissions) or 2) switch to another node that has this library installed. After that, LTRpipeline worked.
#! /bin/bash -x
$HOME/Programs_V/RepeatModeler-2.0.3/LTRPipeline -debug -pa 20 x.fasta
# messages from correct LTRpipeline run
Running LtrHarvest...LTRPipeline::runLtrHarvest : tmpdir = ./LTR_220735.WedMar91650382022/LHAR_220735.WedMar91650382022 LTRPipeline::runLtrHarvest : Returning 1147 annotations. : 00:00:50 (hh:mm:ss) Elapsed Time Running Ltr_retriever...LTRPipeline::runLtrRetriever : tmpdir = ./LTR_220735.WedMar91650382022/LRET_220735.WedMar91651282022 LTRPipeline::runLtrRetriever : Running analysis cd ./LTR_220735.WedMar91650382022/LRET_220735.WedMar91651282022; /users/path/Programs/LTR_retriever/LTR_retriever -repeatmasker /users/path/Programs/RepeatMasker-4.1.2 -blastplus /users/path/Programs/rmblast-2.11.0/bin -cdhit_path /users/path/anaconda3/bin -trf_path /users/path/Programs/TRF-4.09.1/build/src/trf -genome seq.fa -inharvest /users/path/1_Dcatalonica/Dcat_vtest/Repetitions/Dsilv23/RepBase_configured/Scf6_LTRpipeline_controltest/LTR_220735.WedMar91650382022/raw-struct-results.txt -noanno -threads 20 > LTR_retriever.log 2>&1 : 00:00:50 (hh:mm:ss) Elapsed Time Aligning instances...LTRPipeline::runMafft : tmpdir = ./LTR_220735.WedMar91650382022/MAFFT_220735.WedMar91652182022 LTRPipeline::runMafft : Running analysis /users/path/Programs/bin/mafft --large --quiet --thread 20 ./LTR_220735.WedMar91650382022/LtrRetriever-redundant-results.fa > ./LTR_220735.WedMar91650382022/MAFFT_220735.WedMar91652182022/mafft-alignment.fa : 00:00:11 (hh:mm:ss) Elapsed Time Clustering...LTRPipeline::runNinja : tmpdir = ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022 LTRPipeline::runNinja : Running analysis /users/path/Programs/NINJA-0.95-cluster_only/NINJA/Ninja --in ./LTR_220735.WedMar91650382022/mafft-alignment.fa --out ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022/cluster.dat --out_type c --corr_type m --cluster_cutoff 0.2 --threads 20 > ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022/Ninja.log 2>&1
After that, we run RepeatModeler with `-LTRStruct` and `-recovDir` options.
- Here you have the steps that we followed so far to rerun RepeatModeler from `round-6`:
1) Rename `round-6/consensi.fa.bak` to `round-6/consensi.fa.bak`
2) Rerun RepeatModeler with `-LTRStruct` options and `-recoverDir` specifying RM_xxxxx_xxxx your directory from the failed.
Some advices,
- Since the whole RepeatModeler (with -LTRStruct) run requires a lot of time, we tested `LTRpipeline` on a small dataset to see if the problem has been solved. A Drosophila scaffold / genome may be a good option.
Describe the issue
I'm experiencing issues when running the LTR clustering step of repeat modeller.
Reproduction steps
Here is the relevant fragment of the job script.
And here are the logs I get:
output channel: LTR Structural Analysis Running LtrHarvest... : 03:56:52 (hh:mm:ss) Elapsed Time Running Ltr_retriever... : 00:31:37 (hh:mm:ss) Elapsed Time Aligning instances... : 00:04:45 (hh:mm:ss) Elapsed Time Clustering...LTRPipeline: Error - could not cluster MAFFT results. : 00:00:00 (hh:mm:ss) Elapsed Time LTRPipeline Time: 04:33:31 (hh:mm:ss) Elapsed Time
error channel: LTRPipeline : Error - could not open clusters.dat! at LTRPipeline line 325.
How did you install RepeatModeler? I installed RepeatModeler through bioconda.
Which version of RepeatModeler do you have? 2.02