Closed liamfriar closed 10 months ago
I'm afraid we do not use mamba/conda -- my guess is there is a problem with how someone setup the conda recipe. I would recommend installing Ninja using the source available from here: https://github.com/TravisWheelerLab/NINJA/releases/tag/0.98-cluster_only
Then re-run the RepeatModeler "configure" tool to set the location of where you installed Ninja. If you don't want to install from source you can use the pre-built TETools docker/singularity image here to get a complete installation of RepeatModeler + all dependencies already installed and configured ( https://github.com/Dfam-consortium/TETools ).
Hi @rmhubley sorry if I did not describe that well. The NINJA installation was not via conda
. It was directly from the source.
wget https://github.com/TravisWheelerLab/NINJA/archive/refs/tags/0.95-cluster_only.tar.gz
I have been unsuccessful trying to get RepeatModeler
to run. I got BuildDatabase
to run. I have done a mix of conda
and direct installs, so I don't expect you all to be able to figure this out, but the error I have been receiving is:
RepeatModeler -LTRStruct -ninja_dir $ninja_dir -rmblast_dir $rmblast_dir -repeatmasker_dir $repeatmasker_dir -database $prefix 2>&1 | tee $prefix_repeatmodeler.log
RepeatModeler Version 2.0.2
===========================
Search Engine = rmblast 2.14.0+
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.5
LTR Structural Analysis: Enabled ( GenomeTools 1.5.10, LTR_Retriever ,
Ninja , MAFFT 7.520,
CD-HIT 4.8.1 )
Random Number Seed: 1691208703
Database = caroliniana1 .
- Sequences = 358
- Bases = 4721499
- N50 = 19071
- Contig Histogram:
Size(bp) Count
-----------------------------------------------------------------------
62019-66271 | [ 1 ]
57768-62019 | [ ]
53517-57768 | [ ]
49265-53516 |* [ 4 ]
45014-49265 |* [ 3 ]
40763-45014 |** [ 6 ]
36511-40762 |*** [ 8 ]
32260-36511 |** [ 6 ]
28009-32260 |*** [ 9 ]
23757-28008 |****** [ 15 ]
19506-23757 |******* [ 17 ]
15255-19506 |********** [ 25 ]
11003-15254 |*************************** [ 65 ]
6752-11003 |********************************** [ 81 ]
2501-6752 |************************************************** [ 118 ]
Using output directory = ~/data/RepeatModeler_out/caroliniana1/RM_1085902.SatAug50411442023
Storage Throughput = good ( 826.18 MB/s )
Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
and the repetitive content of the sequences. It is not imperative
that RepeatModeler completes all rounds in order to obtain useful
results. At the completion of each round, the files ( consensi.fa, and
families.stk ) found in:
~/data/RepeatModeler_out/caroliniana1/RM_1085902.SatAug50411442023/
will contain all results produced thus far. These files may be
manually copied and run through RepeatClassifier should the program
be terminated early.
RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
- Gathering up to 40000000 bp
- Final Sample Size = 4721422 bp ( 4720079 non ambiguous )
- Num Contigs Represented = 358
- Sequence extraction : 00:00:01 (hh:mm:ss) Elapsed Time
-- Running RepeatScout on the sequences...
- RepeatScout: Running build_lmer_table ( l = 13 )..
- RepeatScout: Running RepeatScout.. : 112 raw families identified
- RepeatScout: Running filtering stage.. 111 families remaining
- RepeatScout: 00:01:11 (hh:mm:ss) Elapsed Time
- Large Satellite Filtering.. : 0 found in 00:00:01 (hh:mm:ss) Elapsed Time
- Collecting repeat instances...
-- Refining Family R=5 / 0 ( RS Elements: 126, Using 100 )
ERROR from search engine (0)
Can't call method "getNumAlignedSeqs" on an undefined value at ~/miniconda3/envs/repeatmodeler/share/RepeatModeler/Refiner line 776.
RepeatModeler: Could not open refined model ~/data/RepeatModeler_out/caroliniana1/RM_1085902.SatAug50411442023/round-1/family-0.fa.refiner_cons!
Unless that is a familiar error, I think I will try to find a server with Docker
to run this on. Thanks.
Using your link above for the Ninja source I did the following:
% wget https://github.com/TravisWheelerLab/NINJA/archive/refs/tags/0.95-cluster_only.tar.gz
Resolving github.com (github.com)
...
2023-08-07 10:51:51 (8.01 MB/s) - ‘0.95-cluster_only.tar.gz’ saved [222127]
% tar zxvf 0.95-cluster_only.tar.gz
NINJA-0.95-cluster_only/
NINJA-0.95-cluster_only/.gitignore
...
NINJA-0.95-cluster_only/README.md
% cd NINJA-0.95-cluster_only/
% ls
.gitignore LICENSE NINJA/ README.md
% cd NINJA/
% make
...
g++ -std=gnu++11 -Wall -mssse3 -fopenmp -O3 ArgumentHandler.o ArrayHeapExtMem.o BinaryHeap_FourInts.o BinaryHeap_IntKey_TwoInts.o BinaryHeap_TwoInts.o BinaryHeap.o CandidateHeap.o DistanceCalculator.o DistanceReader.o DistanceReaderExtMem.o ExceptionHandler.o Ninja.o SequenceFileReader.o Stack.o TreeBuilder.o TreeBuilderBinHeap.o TreeBuilderExtMem.o TreeBuilderManager.o ClusterManager.o TreeNode.o -o Ninja
% ls Ninja*
Ninja* Ninja.cpp Ninja_new* Ninja.o Ninja_old*
# NOTE: There are three executables in this release version, and "Ninja" is the one you want. Originally you said it only generated Ninja_new and Ninja_old. The correct version is the one that has the "m" option for --corr_type. You can check this by running:
% ./Ninja -h
Ninja - Version 0.95-cluster_only
./Ninja --in file.fa --out file.out
Arguments:
--help (or -h) to display this help
--in (or -i) filename
--out (or -o) filename
--in_type type [a | d] (default a)
--out_type type [d | c] (default c)
--corr_type type [n | j | k | s | m]
--cluster_cutoff dist_cutoff (default 0.03)
--threads (or -T) num_threads
--version (or -v) print the software version
For more information, check the README file.
# Then when you run RepeatModeler with the -LTRStruct option Ninja should have a version number reported in the screen output like:
LTR Structural Analysis: Enabled ( GenomeTools 1.5.10, LTR_Retriever v2.9.0,
Ninja 0.95-cluster_only, MAFFT 7.471,
CD-HIT 4.8.1 )
The more recent log seems to indicate a different error. I would recommend upgrading to RepeatModeler 2.0.4 first and see if that fixes your installation problem, 2.0.2 is quite old.
I did not know I had to run the make
command. That appears to have fixed the NINJA
problem. I will see about getting RepeatModeler
to run. Thank you!
Describe the issue
NINJA installation has
Ninja_new
andNinja_old
programs, but noNinja
Reproduction steps
I
mamba
installedRepeatModeler v2.0.2a
mamba install -c bioconda repeatmodeler
RepeatModeler
ran fine immediately which is awesome.RepeatModeler -pa 8 -engine ncbi -database $prefix 2>&1 | tee $prefix_repeatmodeler.log
When I added in the-LTRStruct
flag, I got the following error:I ran
mamba list
in my environment and discovered that noNINJA
package is installed (so that's a problem for the conda/mamba people, I think)113 seemed to have the same issue.
I installed the appropriate version of
NINJA
And ran
RepeatModeler -pa 8 -LTRStruct -ninja_dir $ninja_dir -engine ncbi -database $prefix 2>&1 | tee $prefix_repeatmodeler.log
I got the same error.I then realized there is no file called simply
Ninja
So....mv Ninja_new Ninja
And now RepeatModeler runs fine including what looks like a successful run of the LTR pipeline, although nothing was found (which is neither expected nor unexpected), so I guess it's not totally clear if the run was successful:I am not sure if
Ninja_new
orNinja_old
is the properNinja
to be running?