Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
195 stars 22 forks source link

build_lmer_table failed. Exit code 256 #110

Open BenjaminGuinet opened 3 years ago

BenjaminGuinet commented 3 years ago

Hello I'm writting because I'm facing when I run RepeatModeler.

Here is the code I used :

Sp_name=Tryphoninae_B
ASSEMBLY=/beegfs/data/these/Genomes/Tryphoninae_B/Tryphoninae_B.fa

cd /beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/
/beegfs/data/TOOLS/RepeatModeler/BuildDatabase -name $Sp_name.DB -engine rmblast $ASSEMBLY
echo 'BuildDatabase done' 
echo date 

/beegfs/data/TOOLS/RepeatModeler/RepeatModeler -database $Sp_name.DB -pa 6 -LTRStruct
echo 'RepeatModeler done '

Here is the error message I got :


date
Building database Tryphoninae_B.DB:
  Reading /beegfs/data/these/Genomes/Tryphoninae_B/Tryphoninae_B_corrected.fa...
Number of sequences (bp) added to database: 966353 ( 342816151 bp )
BuildDatabase done
date
RepeatModeler Version 2.0.1
===========================
Search Engine = rmblast 2.10.0+
Dependencies: TRF , RECON , RepeatScout , RepeatMasker 
LTR Structural Analysis: Enabled ( GenomeTools , LTR_Retriever v2.9.0,
                                   Ninja , MAFFT 7.471,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1606139499
Database = Tryphoninae_B.DB .................................................................................................
  - Sequences = 966353
  - Bases = 342816151
  - N50 = 2702
  - Contig Histogram:
  Size(bp)                                                        Count
  -----------------------------------------------------------------------
  139193-149129 |                                                   [  ]
  129258-139193 |                                                   [  ]
  119323-129258 |                                                   [ 4 ]
  109387-119322 |                                                   [ 2 ]
  99452-109387  |                                                   [ 3 ]
  89517-99452   |                                                   [ 4 ]
  79582-89517   |                                                   [ 8 ]
  69646-79581   |                                                   [ 14 ]
  59711-69646   |                                                   [ 32 ]
  49776-59711   |                                                   [ 70 ]
  39841-49776   |                                                   [ 154 ]
  29905-39840   |                                                   [ 347 ]
  19970-29905   |                                                   [ 912 ]
  10035-19970   |                                                   [ 3240 ]
  100-10035     |*************************************************  [ 961562 ]

  WARN: The N50 for this assembly is low ( <10,000 ).  The de novo methods
        employed by RepeatModeler are intended for use with long contiguous
        sequences and may not perform well with an over-abundance of short
        contigs in the database.
Using output directory = /beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/RM_750.MonNov231452352020
Storage Throughput = fair ( 354.35 MB/s )

Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
      and the repetitive content of the sequences.  It is not imperative
      that RepeatModeler completes all rounds in order to obtain useful
      results.  At the completion of each round, the files ( consensi.fa, and
      families.stk ) found in:
      /beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/RM_750.MonNov231452352020/ 
      will contain all results produced thus far. These files may be 
      manually copied and run through RepeatClassifier should the program
      be terminated early.

RepeatModeler Round # 1
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 40000000 bp
   - Final Sample Size = 40011581 bp ( 40009942 non ambiguous )
   - Num Contigs Represented = 116741
   - Sequence extraction : 00:00:15 (hh:mm:ss) Elapsed Time
 -- Running RepeatScout on the sequences...
   - RepeatScout: Running build_lmer_table ( l = 14 )..

build_lmer_table failed. Exit code 256

Do you have an idea of what is going on please ?

jebrosen commented 3 years ago

Dependencies: TRF , RECON , RepeatScout , RepeatMasker LTR Structural Analysis: Enabled ( GenomeTools , LTR_Retriever v2.9.0, Ninja , MAFFT 7.471, CD-HIT 4.8.1 )

This is more missing version numbers than expected. How was RepeatModeler installed and configured?

build_lmer_table failed. Exit code 256

Can you post the contents of RM_<date>/round-1/repeatscout.log? That file might have enough details to troubleshoot the problem further.

z4668640 commented 11 months ago

Dependencies: TRF , RECON , RepeatScout , RepeatMasker LTR Structural Analysis: Enabled ( GenomeTools , LTR_Retriever v2.9.0, Ninja , MAFFT 7.471, CD-HIT 4.8.1 )

This is more missing version numbers than expected. How was RepeatModeler installed and configured?

build_lmer_table failed. Exit code 256

Can you post the contents of ? That file might have enough details to troubleshoot the problem further.RM_<date>/round-1/repeatscout.log

Hello, I have the same problem. Log as follows

Building database Danio_rerio:
  Reading Danio_rerio.chr1.fa...
Number of sequences (bp) added to database: 1 ( 59578282 bp )
RepeatModeler Version 2.0.5
===========================
Using output directory = /home/TestRepeatModeler/RM_7410.TueDec191805342023
Search Engine = rmblast 2.14.1+
Threads = 4
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.5
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever v2.9.5,
                                   Ninja 0.97-cluster_only, MAFFT 7.505,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1702980330
Database = /home/TestRepeatModeler/Danio_rerio
  - Sequences = 1
  - Bases = 59578282
Storage Throughput = good ( 876.79 MB/s )

Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
      and the repetitive content of the sequences.  It is not imperative
      that RepeatModeler completes all rounds in order to obtain useful
      results.  At the completion of each round, the files ( consensi.fa, and
      families.stk ) found in:
      /home/TestRepeatModeler/RM_7410.TueDec191805342023/
      will contain all results produced thus far. These files may be
      manually copied and run through RepeatClassifier should the program
      be terminated early.

RepeatModeler Round # 1
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 40000000 bp
   - Final Sample Size = 40052372 bp ( 40010603 non ambiguous )
   - Num Contigs Represented = 1
   - Sequence extraction : 00:02:04 (hh:mm:ss) Elapsed Time
 -- Running RepeatScout on the sequences...
   - RepeatScout: Running build_lmer_table ( l = 14 )..
build_lmer_table failed. Exit code 256

I looked at the repeatscout.log file, but it was empty. Is there any solution?

z4668640 commented 11 months ago

Dependencies: TRF , RECON , RepeatScout , RepeatMasker LTR Structural Analysis: Enabled ( GenomeTools , LTR_Retriever v2.9.0, Ninja , MAFFT 7.471, CD-HIT 4.8.1 )

This is more missing version numbers than expected. How was RepeatModeler installed and configured?

build_lmer_table failed. Exit code 256

Can you post the contents of ? That file might have enough details to troubleshoot the problem further.RM_<date>/round-1/repeatscout.log

Hello, I have the same problem. Log as follows

Building database Danio_rerio:
  Reading Danio_rerio.chr1.fa...
Number of sequences (bp) added to database: 1 ( 59578282 bp )
RepeatModeler Version 2.0.5
===========================
Using output directory = /home/TestRepeatModeler/RM_7410.TueDec191805342023
Search Engine = rmblast 2.14.1+
Threads = 4
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.5
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever v2.9.5,
                                   Ninja 0.97-cluster_only, MAFFT 7.505,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1702980330
Database = /home/TestRepeatModeler/Danio_rerio
  - Sequences = 1
  - Bases = 59578282
Storage Throughput = good ( 876.79 MB/s )

Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
      and the repetitive content of the sequences.  It is not imperative
      that RepeatModeler completes all rounds in order to obtain useful
      results.  At the completion of each round, the files ( consensi.fa, and
      families.stk ) found in:
      /home/TestRepeatModeler/RM_7410.TueDec191805342023/
      will contain all results produced thus far. These files may be
      manually copied and run through RepeatClassifier should the program
      be terminated early.

RepeatModeler Round # 1
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 40000000 bp
   - Final Sample Size = 40052372 bp ( 40010603 non ambiguous )
   - Num Contigs Represented = 1
   - Sequence extraction : 00:02:04 (hh:mm:ss) Elapsed Time
 -- Running RepeatScout on the sequences...
   - RepeatScout: Running build_lmer_table ( l = 14 )..
build_lmer_table failed. Exit code 256

I looked at the repeatscout.log file, but it was empty. Is there any solution?

I think I found a solution. Please refer to the program at https://github.com/mmcco/RepeatScout/pull/6/commits/c5193bbb0882525a00eaf92dcff2120b0997a1a5 Thanks to EricDeveaud for the solution.