Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
182 stars 23 forks source link

RepeatScout did not return any models. #176

Closed IanDMedeiros closed 1 year ago

IanDMedeiros commented 1 year ago

Describe the issue I am getting a message that "RepeatScout did not return any models." with my own data (de novo contig-level assemblies of fungal genomes) as well as test data. Tools other than RepeatScout appear to be working perfectly fine.

Reproduction steps BuildDatabase -name rice12 rice12.fasta RepeatModeler -pa 3 -database rice12

GenBank accession for rice chromosome 12 test data: https://www.ncbi.nlm.nih.gov/nuccore/CM020887.1.

Log output repeatscout.log is empty. The log file for RepeatModeler is below:

RepeatModeler Version 2.0.3

Using output directory = /hpc/group/bio1/ian/repeat-test/RM_1618251.MonAug80948072022 Search Engine = rmblast 2.11.0+ Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.2 LTR Structural Analysis: Disabled [use -LTRStruct to enable] Random Number Seed: 1659966484 Database = rice12 - Sequences = 1

  • Bases = 25539636 Storage Throughput = poor ( 159.73 MB/s )
  • NOTE: Poor storage througput will have a large impact on RepeatModeler performance. The low throughput observed above may be due to transient usage patterns on the system and may not reflect the actual system performance. Whenever possible run RepeatModeler in a directory stored on a fast local disk and not over a network filesytem.

RepeatModeler Round # 1

Searching for Repeats -- Sampling from the database...

  • Gathering up to 40000000 bp
  • Final Sample Size = 25539570 bp ( 25539270 non ambiguous )
  • Num Contigs Represented = 1
  • Sequence extraction : 00:00:17 (hh:mm:ss) Elapsed Time -- Running RepeatScout on the sequences...
  • RepeatScout: 00:04:51 (hh:mm:ss) Elapsed Time NOTE: RepeatScout did not return any models.

RepeatModeler Round # 2

Searching for Repeats -- Sampling from the database...

  • Gathering up to 3000000 bp
  • Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence...
  • TRFMask time 00:00:08 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 3002220 bp Num Contigs Represented = 1 Non ambiguous bp: Initial: 3002120 bp After Masking: 2962233 bp Masked: 1.33 % -- Input Database Coverage: 3002220 bp out of 25539636 bp ( 11.76 % ) Sampling Time: 00:00:11 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... Comparison Time: 00:00:55 (hh:mm:ss) Elapsed Time, 32578 HSPs Collected Round Time: 00:09:13 (hh:mm:ss) Elapsed Time

RepeatModeler Round # 3

Searching for Repeats -- Sampling from the database...

  • Gathering up to 9000000 bp
  • Sequence extraction : 00:00:06 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence...
  • TRFMask time 00:00:21 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds...
  • TE Masking time 00:00:16 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 9007180 bp Num Contigs Represented = 1 Non ambiguous bp: Initial: 9007180 bp After Masking: 7677273 bp Masked: 14.76 % -- Input Database Coverage: 12009400 bp out of 25539636 bp ( 47.02 % ) Sampling Time: 00:00:44 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... Comparison Time: 00:04:29 (hh:mm:ss) Elapsed Time, 120472 HSPs Collected Round Time: 00:38:36 (hh:mm:ss) Elapsed Time

RepeatModeler Round # 4

Searching for Repeats -- Sampling from the database...

  • Gathering up to 27000000 bp
  • Sequence extraction : 00:00:08 (hh:mm:ss) Elapsed Time -- Running TRFMask on the sequence...
  • TRFMask time 00:00:36 (hh:mm:ss) Elapsed Time -- Masking repeats from the previous rounds...
  • TE Masking time 00:01:27 (hh:mm:ss) Elapsed Time -- Sample Stats: Sample Size 13530110 bp Num Contigs Represented = 1 Non ambiguous bp: Initial: 13529910 bp After Masking: 10046423 bp Masked: 25.75 % -- Input Database Coverage: 25539510 bp out of 25539636 bp ( 100.00 % ) Sampling Time: 00:02:13 (hh:mm:ss) Elapsed Time Running all-by-other comparisons... Comparison Time: 00:07:44 (hh:mm:ss) Elapsed Time, 100817 HSPs Collected Round Time: 00:47:39 (hh:mm:ss) Elapsed Time

RepeatScout/RECON discovery complete: 162 families found

Classification Time: 00:09:25 (hh:mm:ss) Elapsed Time

Program Time: 01:50:01 (hh:mm:ss) Elapsed Time

Environment (please include as much of the following information as you can find out):