Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
183 stars 23 forks source link

can't find species-families.fa file #127

Open jebrosen opened 3 years ago

jebrosen commented 3 years ago

Transferring from rmhubley/RepeatMasker#99:

Hi,

I use following code: step1:BuildDatabase -name bara Bracemosa_genome.fa step2:nohup RepeatModeler -database bara -pa 4 -LTRStruct 1>step2.log 2>&1 &

I can't find bara-families.fa and bara-families.stk files, but the log file doesn't have any error, like this:

$tail step2.log The results have been saved to: bara-families.fa - Consensus sequences for each family identified. bara-families.stk - Seed alignments for each family identified.

The RepeatModeler stockholm file is formatted so that it can easily be submitted to the Dfam database. Please consider contributing curated families to this open database and be a part of this growing community resource. For more information contact help@dfam.org.

I don't know what happened.

jebrosen commented 3 years ago

I can't find bara-families.fa and bara-families.stk files, but the log file doesn't have any error, like this:

Can you post more of the log, especially starting after round-6 (i.e. the LTRPipeline and RepeatClassifier steps)?

If RepeatClassifier failed, there may still be files named consensi.fa and families.stk in the RM_* directory which represent the final results before classification. These files can be manually re-run through RepeatClassifier if needed.

xie186 commented 2 years ago

I encountered the same problem (v2.0.2a). Here is the last part of the log:

Program Time: 19:32:09 (hh:mm:ss) Elapsed Time
Working directory:  /mypath/RM_62149.ThuSep300929402021
may be deleted unless there were problems with the run.

The results have been saved to:
  Pvulgaris_442_v2.0_new-families.fa  - Consensus sequences for each family identified.
  Pvulgaris_442_v2.0_new-families.stk - Seed alignments for each family identified.

The RepeatModeler stockholm file is formatted so that it can
easily be submitted to the Dfam database.  Please consider contributing
curated families to this open database and be a part of this growing
community resource.  For more information contact help@dfam.org.

Here are the commandlines I used:

BuildDatabase -name Pvulgaris_442_v2.0_new -engine ncbi Pvulgaris_442_v2.0.fa
RepeatModeler -pa 30  -database  Pvulgaris_442_v2.0_new

I did see files named consensi.fa and families.stk in the RM_* directory.

Here is the full log:

repeatmodeler.log

Could you please help me when you get a chance? Thanks.

jebrosen commented 2 years ago

@xie186 Hello, and sorry you are having this problem.

From the log there are no error messages from RepeatClassifier, so it looks like RepeatClassifier ran successfully. Do you by chance have the files consensi.fa.classified and families-classified.stk in the RM_* directory?

You can also run RepeatClassifier separately on the output. For example:

RepeatClassifier -consensi RM_dir/consensi.fa -stockholm RM_dir/families.stk

This command re-runs the classifier and creates the files consensi.fa.classified and families-classified.stk.

hirnc commented 2 years ago

I had the same problem and the solution jebrosen posted worked for me. $genomeDB-families.fa and $genomeDB-families.stk are aliases of consensi.fa.classified and families-classified.stk, respectively.

It seems this problem happens because of an incomplete setup in RepeatMasker (I used version 4.1.2-p1). The configure script skips running makeblastdb to Libraries/RepeatMasker.lib and Libraries/RepeatPeps.lib.