Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
228 stars 50 forks source link

FATAL ERROR: RepeatMasker giving up. One or more batches failed! #61

Closed michieitel closed 3 weeks ago

michieitel commented 4 years ago

Hello!

I was running RepeatMasker on a de novo long-read based assembly using a set of combined taxon specific and repeatmodeler library (based on the transcriptome of that species)and got the following error message:

FATAL ERROR: RepeatMasker giving up. One or more batches failed! Unfortunately this type of error cannot be recovered from. Please submit the following details to the feedback page at the repeatmasker website:

   http://www.repeatmasker.org

RepeatMasker Version: 4.1.0 Library Version: CONS-Dfam_3.0 Search Engine: ncbi [ 2.9.0+ ] Command Line: /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker-s -pa 20 -norna -dir ./hard -lib aphrocallistes_vastus-families_hypo_corrected.filtered_for_CDS_repeats.fa -a -inv -lcambig -gff Aphrocallistes_WTDBG2_assembly-1_5kb_corrected_reads.hypo_L_RNA_scaffolded_ONT_TR_PE_sorted.Racon-PE.fasta Batch Number: 821 Disk Space: Filesystem 1K-blocks Used Available Use% Mounted on /dev/vda1 1016067204 584074052 431976768 58% /

System Memory: MemTotal: 379753484 kB MemFree: 157828172 kB MemAvailable: 347635684 kB Cached: 182987280 kB SwapCached: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Further details about this problem may be found in the directory: /home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_21858.FriMay220801282020

identifying Simple Repeats in batch 1285 of 1426

I submitted this as requested at http://www.repeatmasker.org, but never got a reply so I am posting this here as well.

When I go to the directory with more details about this issue I can't find a log/err file that would indicate the issue.

Strangely this is the only of 21 assemblies that throws this error. The others finished successfully.

Any help is appreciated.

Michael

michieitel commented 4 years ago

Hi!

I just ran again and printed the error log.

It says:

main::main::postProcessSearch: FastaDB::substr - Error index out of bounds!

Full log:

Warning...unknown stuff <
>
main::main::postProcessSearch: FastaDB::substr - Error index out of bounds!
(SeqID=, offset=44846, length=122 actualSeqLen=0)
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6069.

Attempting to mask  from 44846 to 44968 ( len = 122 )
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6070.
        main::postProcessSearch(HASH(0x557528712030), SearchResultCollection=HASH(0x55754f5c5100), HASH(0x55754f5bd040), 0, 1, FastaDB=HASH(0x55754f5bd928), 1, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 2811
        main::runTRFStage(HASH(0x557528712030), "identifying Simple Repeats", "batch 821 of 1426", "DIVERGED", "", "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., NCBIBlastSearchEngine=HASH(0x557528770cb0), ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 4626
        main::runSearchStages(HASH(0x5575285c8520), "/home/ubuntu/tools/anaconda3/envs/masking/bin", 36, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/tools/anaconda3/envs/masking/share/RepeatMasker/"..., "", 1426, ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 1143
main::main::postProcessSearch: FastaDB::substr - Error index out of bounds!
(SeqID=, offset=44846, length=122 actualSeqLen=0)
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6069.

Attempting to mask  from 44846 to 44968 ( len = 122 )
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6070.
        main::postProcessSearch(HASH(0x557528712030), SearchResultCollection=HASH(0x55754f5d49d8), HASH(0x55754f5d0658), 0, 1, FastaDB=HASH(0x55754f5d0e80), 1, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 2811
        main::runTRFStage(HASH(0x557528712030), "identifying Simple Repeats", "batch 821 of 1426", "DIVERGED", "", "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., NCBIBlastSearchEngine=HASH(0x557528770cb0), ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 4626
        main::runSearchStages(HASH(0x5575285c8520), "/home/ubuntu/tools/anaconda3/envs/masking/bin", 36, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/tools/anaconda3/envs/masking/share/RepeatMasker/"..., "", 1426, ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 1143
main::main::postProcessSearch: FastaDB::substr - Error index out of bounds!
(SeqID=, offset=44846, length=122 actualSeqLen=0)
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6069.

Attempting to mask  from 44846 to 44968 ( len = 122 )
 at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 6070.
        main::postProcessSearch(HASH(0x557528712030), SearchResultCollection=HASH(0x55754f5d76b8), HASH(0x55754f59d2d8), 0, 1, FastaDB=HASH(0x55754f5eb860), 1, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 2811
        main::runTRFStage(HASH(0x557528712030), "identifying Simple Repeats", "batch 821 of 1426", "DIVERGED", "", "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., NCBIBlastSearchEngine=HASH(0x557528770cb0), ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 4626
        main::runSearchStages(HASH(0x5575285c8520), "/home/ubuntu/tools/anaconda3/envs/masking/bin", 36, "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/avas/repeat_masking/HYPO/corrected_reads/RM_1965"..., "/home/ubuntu/tools/anaconda3/envs/masking/share/RepeatMasker/"..., "", 1426, ...) called at /home/ubuntu/tools/anaconda3/envs/masking/bin/RepeatMasker line 1143
jebrosen commented 4 years ago

I submitted this as requested at http://www.repeatmasker.org, but never got a reply so I am posting this here as well.

Apologies for the delayed response.

main::main::postProcessSearch: FastaDB::substr - Error index out of bounds!
(SeqID=, offset=44846, length=122 actualSeqLen=0)

This is a strange error condition - it seems that either TRF misreported or RepeatMasker has misinterpreted some of the results as missing the sequence name or having a blank name. What version of TRF are you using, and did you have any issues with configuring it? What do the sequence names look like in the FASTA file you are masking, and are they all consistent?

When I go to the directory with more details about this issue I can't find a log/err file that would indicate the issue.

The sheer amount of temporary output from TRF and other intermediate tools are too much to keep around forever, so those files are deleted if the run appears to be successful.