Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
183 stars 23 forks source link

FastaDB::_getFastaRecords: Could not locate lastSeqID=gi|5Query_574! at /my_path/RepeatModeler-2.0.2a/RepeatUtil.pm line 309 #140

Open Lin-Yuying opened 3 years ago

Lin-Yuying commented 3 years ago

Hello,

I got the error message during round 2 generating FastaDB::_getFastaRecords: Could not locate lastSeqID=gi|5Query_574! at /my_path/RepeatModeler-2.0.2a/RepeatUtil.pm line 309. when I runned RepeatModeler using command line: RepeatModeler -database my_db_name -pa 4 -LTRStruct > run.out

And before that, I generated the database using: BuildDatabase -name my_db_name my_genome.fa

The following is all the things in the run.out file:

RepeatModeler Round # 2
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 3000000 bp
   - Sequence extraction : 00:00:02 (hh:mm:ss) Elapsed Time
 -- Running TRFMask on the sequence...
       305 Tandem Repeats Masked
   - TRFMask time 00:00:04 (hh:mm:ss) Elapsed Time
 -- Masking repeats from the previous rounds...
     - Masking 1 - 5 of 98

I configured all the dependencies following the instruction, but it could be possible there are something wrong with my configuring processes. Do you have any idea why this error happened?

Thanks in advance.

Lin

jebrosen commented 3 years ago

That sounds like a bug in RepeatModeler, but it could also be caused by a temporary or environment failure (such as out of memory / out of disk space).

Is the genome file you ran RepeatModeler on publicly available, and if so can you provide a link to it? We would also need the seed number (near the beginning of run.out) to try to reproduce this issue to find out what went wrong.

Another option you can try right away, if you haven't already, is to try to run RepeatModeler again on the same data. RepeatModeler uses a sampling approach, so a second run would process different portions of the genome in a different order - potentially avoiding the problem.

Lin-Yuying commented 3 years ago

Hi Jeb,

Thank you so much for your reply. I think I did not see my computer was out of memory / disk space for some reasons.

This is the header information on run.out.

RepeatModeler Version 2.0.2
===========================
Search Engine = rmblast 2.10.0+
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.2
LTR Structural Analysis: Enabled ( GenomeTools 1.6.1, LTR_Retriever ,
                                   Ninja , MAFFT ,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1621062192
Database = guppy .
  - Sequences = 2768
  - Bases = 731622281
  - N50 = 31497199
  - Contig Histogram:
 Size(bp)                                                        Count
  -----------------------------------------------------------------------
  43200835-46286544 |                                                   [ 1 ]
  40115126-43200834 |                                                   [  ]
  37029418-40115126 |                                                   [  ]
  33943709-37029417 |                                                   [ 3 ]
  30858001-33943709 |                                                   [ 7 ]
  27772292-30858000 |                                                   [ 6 ]
  24686584-27772292 |                                                   [ 4 ]
  21600875-24686583 |                                                   [ 1 ]
  18515167-21600875 |                                                   [  ]
  15429458-18515166 |                                                   [ 1 ]
  12343750-15429458 |                                                   [  ]
  9258041-12343749  |                                                   [  ]
  6172333-9258041   |                                                   [  ]
  3086624-6172332   |                                                   [  ]
  916-3086624       |************************************************** [ 2745 ]       

And the genome that I used for buliding database can be found on http://ftp.ensembl.org/pub/release-104/fasta/poecilia_reticulata/dna/Poecilia_reticulata.Guppy_female_1.0_MT.dna.toplevel.fa.gz

New update: I installed the RepeatMasker myself instead of using bioconda, and it worked fine in the round 2 process. However, after round 6, I got another error eleredef failed. Exit code 11 which seems to raise when running RECON 1.08. Any suggestions or idea why this happened?

Thanks!

Lin

jebrosen commented 3 years ago

The Exit code 11 corresponds to SIGSEGV, which is likely due to a bug in the eleredef program. As before, it's possible that running RepeatModeler again (with yet a new seed number) will produce different results.

What was the seed number for that run? Alternatively, if you can compress and upload or attach the contents of the RM_<...>/round-6 directory, that may also help us to troubleshoot the cause.