Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

LTRPipeline: Error - could not produce a multiple alignment from the denovo LTR results (mafft error) #123

Closed ksielemann closed 3 years ago

ksielemann commented 3 years ago

Dear RepeatModeler Developer-Team,

I have been running RepeatModeler (v2.0) and got the following error:

...
RepeatScout/RECON discovery complete: 3080 families found

LTR Structural Analysis
=======================
Running LtrHarvest...     : 02:54:48 (hh:mm:ss) Elapsed Time
Running Ltr_retriever...  : 02:38:01 (hh:mm:ss) Elapsed Time
Aligning instances...
v0.000 != v7.450 (2019/Aug/23)

There is a problem in the configuration of your shell.
Check the MAFFT_BINARIES environmental variable by
$ echo $MAFFT_BINARIES

This variable must be *unset*, unless you have installed MAFFT
with a special configuration.  To unset this variable, type
$ unset MAFFT_BINARIES
or
% unsetenv MAFFT_BINARIES
Then retry
$ mafft input > output

To keep this change permanently, edit setting files
(.bash_profile, .profile, .cshrc, etc) in your home directory
to delete the MAFFT_BINARIES line.
On MacOSX, also edit or remove the .MacOSX/environment.plist file
and then re-login (MacOSX 10.6) or reboot (MacOSX 10.7).

Please send a problem report to katoh@ifrec.osaka-u.ac.jp,
if this problem remains.

LTRPipeline: Error - could not produce a multiple alignment from the denovo LTR
results.
LTRPipeline Time: 05:39:57 (hh:mm:ss) Elapsed Time

RepeatClassifier Version 2.0
======================================
Search Engine = rmblast
  - Looking for Simple and Low Complexity sequences..
  - Looking for similarity to known repeat proteins..
  - Looking for similarity to known repeat consensi..
Classification Time: 06:26:06 (hh:mm:ss) Elapsed Time

Program Time: 522:39:30 (hh:mm:ss) Elapsed Time
Working directory:  ~/RM_XXXX
may be deleted unless there were problems with the run.

The results have been saved to:
  XXX-families.fa  - Consensus sequences for each family identified.
  XXX-families.stk - Seed alignments for each family identified.

The RepeatModeler stockholm file is formatted so that it can
easily be submitted to the Dfam database.  Please consider contributing
curated families to this open database and be a part of this growing
community resource.  For more information contact help@dfam.org.

I tried to use -recoverDir after updating mafft to the latest version and got the message that my directory 'appears to contain a successful run of RepeatModeler'.

There are two files in the LTR output folder: LtrRetriever-redundant-results.fa and raw-struct-results.txt. Further, I got the expected output files families.fa and families.stk.

My question is now: Am I missing some important results or was the run successful? Is there only an alignment file missing, which mafft didn't produce in my case?

Thanks in advance and best regards, Katharina

jebrosen commented 3 years ago

Dear RepeatModeler Developer-Team,

I have been running RepeatModeler (v2.0) and got the following error:

v0.000 != v7.450 (2019/Aug/23)

There is a problem in the configuration of your shell.
Check the MAFFT_BINARIES environmental variable by
$ echo $MAFFT_BINARIES

This variable must be *unset*, unless you have installed MAFFT
with a special configuration.  To unset this variable, type
$ unset MAFFT_BINARIES
or
% unsetenv MAFFT_BINARIES
Then retry
$ mafft input > output

To keep this change permanently, edit setting files
(.bash_profile, .profile, .cshrc, etc) in your home directory
to delete the MAFFT_BINARIES line.
On MacOSX, also edit or remove the .MacOSX/environment.plist file
and then re-login (MacOSX 10.6) or reboot (MacOSX 10.7).

Please send a problem report to katoh@ifrec.osaka-u.ac.jp,
if this problem remains.

Did you check your MAFFT_BINARIES variable as the error message suggests? The main reasons I can think of for this message to appear are setting that variable unnecessarily or to a wrong value, or if the mafft program files were moved somewhere else after make install was run.

I tried to use -recoverDir after updating mafft to the latest version and got the message that my directory 'appears to contain a successful run of RepeatModeler'.

Yes, this is issue #65. One possible workaround is to rename the file $recoverDir/round-6/consensi.fa to something else, so that it looks like round-6 failed. RepeatModeler should then resume from round-6 and then continue with the LTR structural search pipeline.

There are two files in the LTR output folder: LtrRetriever-redundant-results.fa and raw-struct-results.txt. Further, I got the expected output files families.fa and families.stk.

My question is now: Am I missing some important results or was the run successful? Is there only an alignment file missing, which mafft didn't produce in my case?

MAFFT and the rest of the LTRPipeline (-LTRStruct) did not run, so the resulting families.fa and families.stk are based only on the RepeatScout+RECON analyses like in previous versions of RepeatModeler that did not have -LTRStruct.

ksielemann commented 3 years ago

Thank you very much for your reply and sorry for my late answer!

The MAFFT_BINARIES were set correctly. The problem was that RepeatModeler used an old version of MAFFT in our system leading to the error reported above. Further, the runtime was very long (more than 83 hours with 50 parallel jobs); but everything worked now!