Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
194 stars 22 forks source link

BuildDatabase Batch File #13

Closed PlantDr430 closed 4 years ago

PlantDr430 commented 6 years ago

Hello,

Every time I try to build a database with multiple fasta files from either a batch file or directory no sequences get added to the database. See run below (I cut out some of the genomes to not fill up the page).

BuildDatabase -name Test_genomes -engine ncbi -batch repeat_list.bat

Building database Test_genomes: Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/Chum7_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/Clav04_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM576_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM582_assembly.fasta to database Adding /Volumes/Pegasus/WykaClavicepsGenomes/ClavicepsAssemblies/repeatmodeler/LM583_assembly.fasta to database Number of sequences (bp) added to database: 0 ( 0 bp )

This run only produces a translation file. (below, I put in .txt format to show)

Test_genomes.translation.txt

kubu4 commented 6 years ago

I'm having the same issue, but with a single input FastA file (here's the script I run in a Jupyter notebook):

# RepeatModeler path
rptm=/home/shared/RepeatModeler-open-1.0.11/

# Genome paths
Olurida_v080=/home/sam/data/genomes/oly/Olurida_v080.fa
Olurida_v081=/home/sam/data/genomes/oly/Olurida_v081.fa

# Run on v080
echo "------------------------------------------------------------------------"
echo "Begin v080 RepeatModeler"
cd /home/sam/analyses/20181022_Olurida_v080_repeatmodeler
time \
perl ${rptm}BuildDatabase \
-name Ostrea_lurida_v080 \
${Olurida_v080} \
1> stdout.out \
2> stderr.err

Contents of stderr.err:

Building database Ostrea_lurida_v080:
  Adding /home/sam/data/genomes/oly/Olurida_v080.fa to database
Number of sequences (bp) added to database: 0 ( 0 bp )

Here's the list of files generated:

Ostrea_lurida_v080.nhr
Ostrea_lurida_v080.nin
Ostrea_lurida_v080.nnd
Ostrea_lurida_v080.nni
Ostrea_lurida_v080.nog
Ostrea_lurida_v080.nsq
Ostrea_lurida_v080.translation
stderr.err
stdout.out

Any help would be greatly appreciated.

Note: I also re-ran this and specified -engine ncbi and got the same results.

kubu4 commented 6 years ago

I figured this out!

For me, the issue pertained to configuration of RepeatModeler and setting the path for rmblast and makeblastdb, as I was using symlinks in a common location (e.g. /home/shared/bin) to just these two programs. Even though the configuration walk-through only asks for the path to these two programs, it turns out, it actually needs a path for virtually all of the BLAST suite of tools!

Here's the section of the RepModelConfig.pm file where I discovered this:

selection_075

As such, I went through the configure procedure again, but this time, pointed it to the entire BLAST installation location (which had the RepeatMasker patch installed. E.g.:

ncbi-blast-2.6.0+-src/c++/ReleaseMT/bin/

Ninet93 commented 5 years ago

I have the same problem as you, except that BuildDatabase command only adds the sequences of the first .fasta file to the database. I tried with '-batch' and '-dir' parameters and it still doesn't work.

The location of RMBLAST programs and support utilities are correctly set in my opinion.

I really don't get it.

Any ideas ?

rmhubley commented 5 years ago

Sorry for the long delay folks. This is a bug in BuildDatabase and the fix hasn't yet made it into a release. I do plan to do that soon. However, if you need the multi FASTA file functionality, you can pull the "BuildDatabase" file from the github project master branch ( only that file is needed and it is compatible with 1.0.11 ). I will close this when 1.0.12 comes out.

dejonggr commented 5 years ago

I'm not sure if this a consequence of my fasta file formatting but I've applied your solution (i.e. using the BuildDatabase file from the github masterbranch) and I no longer receive the error discussed above. Instead, after reading in my files (or one multifasta file; I've tried it with both that and using the -dir option), I get this error:

Something went wrong parsing input file! /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb returned:

Building a new DB, current time: 06/06/2019 15:49:42 New DB name: /scratch/dejonggr/mRNA/repeatmasker/genomes/Boleracea New DB title: tmpSeqData.fa Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B

The command used was: /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb -out Boleracea -parse_seqids -dbtype nucl -in tmpSeqData.fa Died at /home/dejonggr/scratch/mRNA/repeatmasker/RepeatModeler-open-1.0.11/BuildDatabase line 321

Creating it directly from my rmblast directory works fine but for some reason it doesn't using BuildDatabase.

rmhubley commented 5 years ago

dejonggr : How did you run BuildDatabase? The reason the script failed is that it could not find the file "Boleracea.nsq" in your current working directory after running this command:

/home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/makeblastdb -out Boleracea -parse_seqids -dbtype nucl -in tmpSeqData.fa

When you say "Creating it directly from my rmblast directory works fine", what do you mean by that? Do you mean:

% cd /home/dejonggr/scratch/mRNA/repeatmasker/rmblast-2.9.0/ % /home/dejonggr/scratch/mRNA/repeatmasker/RepeatModeler-open-1.0.11/BuildDatabase -name Boleacea somefile.fa

smoothyly commented 1 year ago

I pull the "BuildDatabase" file from the github project master branch,but still occur this error. This is my code

perl ~/geneomic/repeat_find/repeatmodel/BuildDatabase.txt -name lh -engine rmblast ~/geneomic/reepeatmodel/out.fa this is error

Building database lh:
  Reading /home/liuli/geneomic/repeat_find/repeatmodel/out.fa...
Number of sequences (bp) added to database: 0 ( 0 bp )

Any help would be greatly appreciated.