Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

LTR clustering step stops #134

Open rdacemel opened 3 years ago

rdacemel commented 3 years ago

Describe the issue

I'm experiencing issues when running the LTR clustering step of repeat modeller.

Reproduction steps

Here is the relevant fragment of the job script.

db=$1
pa=4
ninja=/home/rdoming/scratch/programs/NINJA-0.95-cluster_only/NINJA

RepeatModeler -LTRStruct -pa ${pa} -database ${db} -ninja_dir ${ninja}

And here are the logs I get:

output channel: LTR Structural Analysis Running LtrHarvest... : 03:56:52 (hh:mm:ss) Elapsed Time Running Ltr_retriever... : 00:31:37 (hh:mm:ss) Elapsed Time Aligning instances... : 00:04:45 (hh:mm:ss) Elapsed Time Clustering...LTRPipeline: Error - could not cluster MAFFT results. : 00:00:00 (hh:mm:ss) Elapsed Time LTRPipeline Time: 04:33:31 (hh:mm:ss) Elapsed Time

error channel: LTRPipeline : Error - could not open clusters.dat! at LTRPipeline line 325.

ricardo-aaron commented 3 years ago

I'm having the same problem: Clustering...LTRPipeline: Error - could not cluster MAFFT results. : 00:00:13 (hh:mm:ss) Elapsed Time LTRPipeline : Error - could not open /path/RM_1419367.WedApr211045152021/LTR_2817894.ThuApr220802152021/clusters.dat! at /path/LTRPipeline line 325. I have a manual installation of version 2.0.1, which works just fine with another same size genome of a closely related species.

jebrosen commented 3 years ago

This message indicates a problem with running NINJA. Can you post the contents of the file Ninja.log? It should be in the same directory as clusters.dat in the error message.

ninja=/home/rdoming/scratch/programs/NINJA-0.95-cluster_only/NINJA

There is a newer version of NINJA (cluster_only) which included a few fixes for files of certain sizes; maybe this newer version would work? https://github.com/TravisWheelerLab/NINJA/tree/0.97-cluster_only

ricardo-aaron commented 3 years ago

I can't find a Ninja.log in either the error run or the successful run. In the error run where clusters.dat should be there are three files: LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt I'm running again the genome with the git cluster-only Ninja, see if that's the problem.

jebrosen commented 3 years ago

In the error run where clusters.dat should be there are three files:

Sorry, this was probably my mistake. There should be another directory in there, NINJA_..., that contains the Ninja.log file

ricardo-aaron commented 3 years ago

I don't see any Ninja folder or file under the RM_... output dir of either the ok run or the error run.

rdacemel commented 3 years ago

Sorry for the delayed response and thanks for the early reply and @ricardo-aaron for the follow-up. Like @ricardo-aaron I cannot find any Ninja log in the RM_... work directory.

jebrosen commented 3 years ago

Sorry again - I forgot about this snag: these particular processes generate many large temporary files, so they are deleted by default. The -debug option is necessary to keep that particular directory around.


@ricardo-aaron, you should be able to run LTRPipeline -debug genome.fa to run only the LTR structural steps while keeping the intermediate files, including Ninja.log which should help explain the problem.


@rdacemel, I might have found the cause of your problem. Bioconda does not package NINJA, which should be fine because you installed it yourself and used -ninja_dir. However, it does not look like RepeatModeler -ninja_dir=... actually passes the ninja_dir option along to LTRPipeline (cc @rmhubley - and perhaps other scripts than LTRPipeline are affected too?)

Instead, you can set the option via an environment variable, which is read directly by both RepeatModeler and LTRPipeline:

db=$1
pa=4
export NINJA_DIR=/home/rdoming/scratch/programs/NINJA-0.95-cluster_only/NINJA

# For troubleshooting; -debug will keep around Ninja.log and other files
LTRPipeline -debug genome.fa

# No need for -ninja_dir here anymore, since it was set above
RepeatModeler -LTRStruct -pa ${pa} -database ${db}

If using this environment variable instead of the command-line option does not end up solving the problem, the contents of Ninja.log should still be able to help troubleshoot the issue further.

saxovocal commented 2 years ago

Hi @jebrosen ; I have encountered the same bug as this thread, but by exporting the NINJA_DIR, the LTRPipeline was able to run.

May I ask if I therefore need to run RepeatModeler again (since it takes quite some time?) I have tried to use recover_dir flag, but it says the job has finished (although the mafft clustering didn't run); or I can somehow combine the outputs of LTRPipeline and the partiall done RepeatModeler outputs?

Thanks!

jebrosen commented 2 years ago

I have tried to use recover_dir flag, but it says the job has finished (although the mafft clustering didn't run); or I can somehow combine the outputs of LTRPipeline and the partiall done RepeatModeler outputs?

@saxovocal Yes, unfortunately -recoverDir does not yet detect to re-run only the LTRPipeline and combining steps. One possible alternative to re-running all of RepeatModeler is renaming the file round-6/consensi.fa to something else and then using -recoverDir: this would still repeat some work, but at the beginning of round 6 instead of round 1.

grpiccoli commented 2 years ago

Hi @jebrosen it looks like it might be an issue with the CPU I tried on a different machine and it ran like a charm, thanks

V-JJ commented 2 years ago

Hello!

I run into the same problem recently. I have followed the abovementioned steps to rerun RepeatModeler from round-6. And it worked well until NINJA error that apparently is related with de GLIBCXX version of a given node or computer. Specifically, NINJA-0.95 needs this version: GLIBCXX_3.4.21. Higher versions of NINJA failed for the same reason (at least in our case)

This command prints the available library versions on your system: strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX

We've been told two solutions: 1) upgrade Debian version (with root permissions) or 2) switch to another node that has this library installed. After that, LTRpipeline worked.

Running LtrHarvest...LTRPipeline::runLtrHarvest : tmpdir = ./LTR_220735.WedMar91650382022/LHAR_220735.WedMar91650382022 LTRPipeline::runLtrHarvest : Returning 1147 annotations. : 00:00:50 (hh:mm:ss) Elapsed Time Running Ltr_retriever...LTRPipeline::runLtrRetriever : tmpdir = ./LTR_220735.WedMar91650382022/LRET_220735.WedMar91651282022 LTRPipeline::runLtrRetriever : Running analysis cd ./LTR_220735.WedMar91650382022/LRET_220735.WedMar91651282022; /users/path/Programs/LTR_retriever/LTR_retriever -repeatmasker /users/path/Programs/RepeatMasker-4.1.2 -blastplus /users/path/Programs/rmblast-2.11.0/bin -cdhit_path /users/path/anaconda3/bin -trf_path /users/path/Programs/TRF-4.09.1/build/src/trf -genome seq.fa -inharvest /users/path/1_Dcatalonica/Dcat_vtest/Repetitions/Dsilv23/RepBase_configured/Scf6_LTRpipeline_controltest/LTR_220735.WedMar91650382022/raw-struct-results.txt -noanno -threads 20 > LTR_retriever.log 2>&1 : 00:00:50 (hh:mm:ss) Elapsed Time Aligning instances...LTRPipeline::runMafft : tmpdir = ./LTR_220735.WedMar91650382022/MAFFT_220735.WedMar91652182022 LTRPipeline::runMafft : Running analysis /users/path/Programs/bin/mafft --large --quiet --thread 20 ./LTR_220735.WedMar91650382022/LtrRetriever-redundant-results.fa > ./LTR_220735.WedMar91650382022/MAFFT_220735.WedMar91652182022/mafft-alignment.fa : 00:00:11 (hh:mm:ss) Elapsed Time Clustering...LTRPipeline::runNinja : tmpdir = ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022 LTRPipeline::runNinja : Running analysis /users/path/Programs/NINJA-0.95-cluster_only/NINJA/Ninja --in ./LTR_220735.WedMar91650382022/mafft-alignment.fa --out ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022/cluster.dat --out_type c --corr_type m --cluster_cutoff 0.2 --threads 20 > ./LTR_220735.WedMar91650382022/NINJA_220735.WedMar91652292022/Ninja.log 2>&1



After that,  we run RepeatModeler with `-LTRStruct` and `-recovDir` options.
- Here you have the steps that we followed so far to rerun RepeatModeler from `round-6`:
1) Rename `round-6/consensi.fa.bak` to `round-6/consensi.fa.bak`
2) Rerun RepeatModeler with `-LTRStruct` options and `-recoverDir` specifying RM_xxxxx_xxxx your directory from the failed.

Some advices,
- Since the whole RepeatModeler (with -LTRStruct) run requires a lot of time, we tested  `LTRpipeline` on a small dataset to see if the problem has been solved.  A Drosophila scaffold / genome may be a good option.