Closed PlantDr430 closed 4 years ago
I'm having the same issue, but with a single input FastA file (here's the script I run in a Jupyter notebook):
# RepeatModeler path
rptm=/home/shared/RepeatModeler-open-1.0.11/
# Genome paths
Olurida_v080=/home/sam/data/genomes/oly/Olurida_v080.fa
Olurida_v081=/home/sam/data/genomes/oly/Olurida_v081.fa
# Run on v080
echo "------------------------------------------------------------------------"
echo "Begin v080 RepeatModeler"
cd /home/sam/analyses/20181022_Olurida_v080_repeatmodeler
time \
perl ${rptm}BuildDatabase \
-name Ostrea_lurida_v080 \
${Olurida_v080} \
1> stdout.out \
2> stderr.err
Contents of stderr.err
:
Building database Ostrea_lurida_v080:
Adding /home/sam/data/genomes/oly/Olurida_v080.fa to database
Number of sequences (bp) added to database: 0 ( 0 bp )
Here's the list of files generated:
Ostrea_lurida_v080.nhr
Ostrea_lurida_v080.nin
Ostrea_lurida_v080.nnd
Ostrea_lurida_v080.nni
Ostrea_lurida_v080.nog
Ostrea_lurida_v080.nsq
Ostrea_lurida_v080.translation
stderr.err
stdout.out
Any help would be greatly appreciated.
Note: I also re-ran this and specified -engine ncbi
and got the same results.
I figured this out!
For me, the issue pertained to configuration of RepeatModeler and setting the path for rmblast
and makeblastdb
, as I was using symlinks in a common location (e.g. /home/shared/bin
) to just these two programs. Even though the configuration walk-through only asks for the path to these two programs, it turns out, it actually needs a path for virtually all of the BLAST suite of tools!
Here's the section of the RepModelConfig.pm
file where I discovered this:
As such, I went through the configure procedure again, but this time, pointed it to the entire BLAST installation location (which had the RepeatMasker patch installed. E.g.:
ncbi-blast-2.6.0+-src/c++/ReleaseMT/bin/
I have the same problem as you, except that BuildDatabase
command only adds the sequences of the first .fasta file to the database. I tried with '-batch' and '-dir' parameters and it still doesn't work.
The location of RMBLAST programs and support utilities are correctly set in my opinion.
I really don't get it.
Any ideas ?
Sorry for the long delay folks. This is a bug in BuildDatabase and the fix hasn't yet made it into a release. I do plan to do that soon. However, if you need the multi FASTA file functionality, you can pull the "BuildDatabase" file from the github project master branch ( only that file is needed and it is compatible with 1.0.11 ). I will close this when 1.0.12 comes out.
I'm not sure if this a consequence of my fasta file formatting but I've applied your solution (i.e. using the BuildDatabase file from the github masterbranch) and I no longer receive the error discussed above. Instead, after reading in my files (or one multifasta file; I've tried it with both that and using the -dir option), I get this error:
Something went wrong parsing input file! /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb returned:
Building a new DB, current time: 06/06/2019 15:49:42 New DB name: /scratch/dejonggr/mRNA/repeatmasker/genomes/Boleracea New DB title: tmpSeqData.fa Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B
The command used was: /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb -out Boleracea -parse_seqids -dbtype nucl -in tmpSeqData.fa Died at /home/dejonggr/scratch/mRNA/repeatmasker/RepeatModeler-open-1.0.11/BuildDatabase line 321
Creating it directly from my rmblast directory works fine but for some reason it doesn't using BuildDatabase.
dejonggr : How did you run BuildDatabase? The reason the script failed is that it could not find the file "Boleracea.nsq" in your current working directory after running this command:
/home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb -out Boleracea -parse_seqids -dbtype nucl -in tmpSeqData.fa
When you say "Creating it directly from my rmblast directory works fine", what do you mean by that? Do you mean:
% cd /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/ % /home/dejonggr/scratch/mRNA/repeatmasker/RepeatModeler-open-1.0.11/BuildDatabase -name Boleacea somefile.fa
I pull the "BuildDatabase" file from the github project master branch,but still occur this error. This is my code
perl ~/geneomic/repeat_find/repeatmodel/BuildDatabase.txt -name lh -engine rmblast ~/geneomic/reepeatmodel/out.fa
this is error
Building database lh:
Reading /home/liuli/geneomic/repeat_find/repeatmodel/out.fa...
Number of sequences (bp) added to database: 0 ( 0 bp )
Any help would be greatly appreciated.
Hello,
Every time I try to build a database with multiple fasta files from either a batch file or directory no sequences get added to the database. See run below (I cut out some of the genomes to not fill up the page).
BuildDatabase -name Test_genomes -engine ncbi -batch repeat_list.bat
Building database Test_genomes: Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/Chum7_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/Clav04_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM576_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM582_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM583_assembly.fasta to database Number of sequences (bp) added to database: 0 ( 0 bp )
This run only produces a translation file. (below, I put in .txt format to show)
Test_genomes.translation.txt