Describe the issue
I am getting a message that "RepeatScout did not return any models." with my own data (de novo contig-level assemblies of fungal genomes) as well as test data. Tools other than RepeatScout appear to be working perfectly fine.
NOTE: Poor storage througput will have a large impact on RepeatModeler
performance. The low throughput observed above may be due to
transient usage patterns on the system and may not reflect the
actual system performance. Whenever possible run RepeatModeler
in a directory stored on a fast local disk and not over a
network filesytem.
RepeatModeler Round # 1
Searching for Repeats
-- Sampling from the database...
Gathering up to 40000000 bp
Final Sample Size = 25539570 bp ( 25539270 non ambiguous )
Num Contigs Represented = 1
Sequence extraction : 00:00:17 (hh:mm:ss) Elapsed Time
-- Running RepeatScout on the sequences...
RepeatScout: 00:04:51 (hh:mm:ss) Elapsed Time
NOTE: RepeatScout did not return any models.
RepeatModeler Round # 2
Searching for Repeats
-- Sampling from the database...
Gathering up to 3000000 bp
Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time
-- Running TRFMask on the sequence...
TRFMask time 00:00:08 (hh:mm:ss) Elapsed Time
-- Sample Stats:
Sample Size 3002220 bp
Num Contigs Represented = 1
Non ambiguous bp:
Initial: 3002120 bp
After Masking: 2962233 bp
Masked: 1.33 %
-- Input Database Coverage: 3002220 bp out of 25539636 bp ( 11.76 % )
Sampling Time: 00:00:11 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...
Comparison Time: 00:00:55 (hh:mm:ss) Elapsed Time, 32578 HSPs Collected
Round Time: 00:09:13 (hh:mm:ss) Elapsed Time
RepeatModeler Round # 3
Searching for Repeats
-- Sampling from the database...
Gathering up to 9000000 bp
Sequence extraction : 00:00:06 (hh:mm:ss) Elapsed Time
-- Running TRFMask on the sequence...
TRFMask time 00:00:21 (hh:mm:ss) Elapsed Time
-- Masking repeats from the previous rounds...
TE Masking time 00:00:16 (hh:mm:ss) Elapsed Time
-- Sample Stats:
Sample Size 9007180 bp
Num Contigs Represented = 1
Non ambiguous bp:
Initial: 9007180 bp
After Masking: 7677273 bp
Masked: 14.76 %
-- Input Database Coverage: 12009400 bp out of 25539636 bp ( 47.02 % )
Sampling Time: 00:00:44 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...
Comparison Time: 00:04:29 (hh:mm:ss) Elapsed Time, 120472 HSPs Collected
Round Time: 00:38:36 (hh:mm:ss) Elapsed Time
RepeatModeler Round # 4
Searching for Repeats
-- Sampling from the database...
Gathering up to 27000000 bp
Sequence extraction : 00:00:08 (hh:mm:ss) Elapsed Time
-- Running TRFMask on the sequence...
TRFMask time 00:00:36 (hh:mm:ss) Elapsed Time
-- Masking repeats from the previous rounds...
TE Masking time 00:01:27 (hh:mm:ss) Elapsed Time
-- Sample Stats:
Sample Size 13530110 bp
Num Contigs Represented = 1
Non ambiguous bp:
Initial: 13529910 bp
After Masking: 10046423 bp
Masked: 25.75 %
-- Input Database Coverage: 25539510 bp out of 25539636 bp ( 100.00 % )
Sampling Time: 00:02:13 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...
Comparison Time: 00:07:44 (hh:mm:ss) Elapsed Time, 100817 HSPs Collected
Round Time: 00:47:39 (hh:mm:ss) Elapsed Time
RepeatScout/RECON discovery complete: 162 families found
Classification Time: 00:09:25 (hh:mm:ss) Elapsed Time
Program Time: 01:50:01 (hh:mm:ss) Elapsed Time
Environment (please include as much of the following information as you can find out):
manual installation from repeatmasker.org
RepeatModeler Version 2.0.3
RepeatMasker version 4.1.2-p1 using full Dfam database
Describe the issue I am getting a message that "RepeatScout did not return any models." with my own data (de novo contig-level assemblies of fungal genomes) as well as test data. Tools other than RepeatScout appear to be working perfectly fine.
Reproduction steps BuildDatabase -name rice12 rice12.fasta RepeatModeler -pa 3 -database rice12
GenBank accession for rice chromosome 12 test data: https://www.ncbi.nlm.nih.gov/nuccore/CM020887.1.
Log output repeatscout.log is empty. The log file for RepeatModeler is below:
Environment (please include as much of the following information as you can find out):