Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
226 stars 49 forks source link

Taxonomy.pm syntax error - "Bareword found where operator expected" near "s/'/'"'"'/r " #93

Closed jebrosen closed 3 years ago

jebrosen commented 3 years ago

Copied from https://github.com/rmhubley/RepeatMasker/issues/50#issuecomment-766803634 by @Tiramisu023:

I also had another problem when I finished the installation successfully and tried to start RepeatMasker.

(base) [liyulong@node1 ~]$ RepeatMasker -h
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r"
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r "
Compilation failed in require at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.
BEGIN failed--compilation aborted at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.`

The Error message indicated something was wrong with the Taxonomy.pm file in the folder. However, when I replaced this file with the old version (4.1.0), it works! Does using an older version of Taxonomy.pm file in the new RepearMasker version affect the final results ?

What version of perl do you have (perl --version) so that we can try and reproduce and fix the error? I have not seen this issue before, and I want to be able to test any possible fixes to that.

I think your best option will be to use RepeatMasker 4.1.0 unless we can quickly identify a workaround for this issue. Taxonomy.pm was one of the main changes in 4.1.1 and mixing versions might work fine for your particular run or it might also silently do something wrong.

Tiramisu023 commented 3 years ago

Thanks for your reply!

(base) [liyulong@node1 software]$ which perl
~/software/anaconda3/envs/circos/bin/perl

(base) [liyulong@node1 software]$ perl --version
This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

Copyright 1987-2018, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
Tiramisu023 commented 3 years ago

However, when I replaced this file with the old version (4.1.0), it works!

Actually, it only works with " RepeatMasker -h". when I use "RepeatMasker -pa 8 -e ncbi -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa", it reported an error.

(base) [liyulong@node1 01.test]$ RepeatMasker -pa 8 -e ncbi -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa RepeatMasker version 4.1.1 Search Engine: NCBI/RMBLAST [ 2.10.0+ ] Taxonomy::new() Could not locate the taxonomy data file! at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 635.

Thus, the old Taxonomy.pm file couln't work in the RepeatMasker-4.1.1.

I will use the 4.1.0 version instead, thanks so much.

Tiramisu023 commented 3 years ago

I'm so happy to finally be able to use ABBLAST in RepeatMasker 4.1.0 refering to issues #94.

It seems that the Dfam.h5 file (about 84 GB) couldn't merging into 4.1.0's libraries.

When I use conda to install RepeatMasker before, the RepeatMasker version is 4.0.9, and the Dfam.h5 can be merged into the RepeatMaskerLib.h5 (together with RepBaseRepeatMaskerEdition Lib), just like 4.1.1 does.

jebrosen commented 3 years ago
(base) [liyulong@node1 software]$ which perl
~/software/anaconda3/envs/circos/bin/perl

(base) [liyulong@node1 software]$ perl --version
This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

Thank you! I will try to reproduce the error with this version of perl and/or conda specifically. If it does turn out to be a bug in specific perl versions, we may be able to work around it.


It seems that the Dfam.h5 file (about 84 GB) couldn't merging into 4.1.0's libraries.

When I use conda to install RepeatMasker before, the RepeatMasker version is 4.0.9, and the Dfam.h5 can be merged into the RepeatMaskerLib.h5 (together with RepBaseRepeatMaskerEdition Lib), just like 4.1.1 does.

This sounds very messy and confusing. Hopefully I can clear some things up:

Both RepeatMasker 4.0.9 and RepeatMasker 4.1.0 only support a Dfam.hmm and Dfam.embl file. Dfam.hmm is used for only the nhmmer engine (which uses profile Hidden Markov Models, pHMMs), and Dfam.embl (optionally combined with RepBase RepeatMasker Edition) is used for the other search engines (which use consensus sequences).

Starting in 4.1.1, RepeatMasker only supports reading from a Dfam.h5 file, optionally combined with RepBase RepeatMasker Edition. This file format includes both pHMM and consensus models in one file, has better scalability than the .hmm and .embl formats, and is used with all search engines.

Bioconda is already up to RepeatMasker 4.1.1, so I am not sure how you got version 4.0.9 (was it an installation you still had from before?). I also don't know how you used Dfam.h5 with RepeatMasker 4.0.9; I would expect it to ignore that file completely and use the preinstalled (older) Dfam library.

All recent versions of RepeatMasker come with the "curated only" subset of Dfam preinstalled. If you wish you can install newer versions of the _curatedonly library, or the full 84GB library, and we provide both .h5 and .hmm/.embl versions of these for compatibility with most versions of RepeatMasker.


I think we can explain more of this up front than we do, at least for the latest versions of our tools. Maybe in our READMEs or help files? Are there any particular web pages or files where you looked for this information and did not find it?

jebrosen commented 3 years ago

Thank you! I will try to reproduce the error with this version of perl and/or conda specifically.

I installed perl=5.26.2 in a conda environment and did not have the error. I also saw that perl has been updated to 5.32.0.1 in conda-forge; maybe updating your perl will solve it?

If you do get a chance to try RepeatMasker 4.1.1 with the latest version of perl and still have the issue, please let us know.

Tiramisu023 commented 3 years ago

Thanks for your reply.

In my earlier response, I tested the version of Perl used in my local account.

(base) [liyulong@node1 RepeatMasker-4.1.1]$ which perl ~/software/anaconda3/envs/circos/bin/perl

(base) [liyulong@node1 RepeatMasker-4.1.1]$ perl --version This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-linux-thread-multi

However, this may not be appropriate, because the RepeatMasker file is calling the Perl of the root account by default (/usr/bin/perl).

(base) [liyulong@node1 RepeatMasker-4.1.1]$ head ./RepeatMasker
#!/usr/bin/perl
##---------------------------------------------------------------------------##
##  File:
##      @(#) RepeatMasker
##  Author:
##      Arian Smit <asmit@systemsbiology.org>
##      Robert Hubley <rhubley@systemsbiology.org>

This is the perl information from the root account.

(base) [liyulong@node1 RepeatMasker-4.1.1]$ /usr/bin/perl -v

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

Copyright 1987-2009, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

I installed the version 5.32.1 of Perl in my account, and replaced the first line of RepeatMasker file.

(base) [liyulong@node1 RepeatMasker-4.1.1]$ which perl
~/software/perl-5.32.1/perl

(base) [liyulong@node1 RepeatMasker-4.1.1]$ head ./RepeatMasker
#!/public2/users/liyulong/software/perl-5.32.1/perl
##---------------------------------------------------------------------------##
##  File:
##      @(#) RepeatMasker
##  Author:
##      Arian Smit <asmit@systemsbiology.org>
##      Robert Hubley <rhubley@systemsbiology.org>
##  Description:
##      Takes one or more DNA sequence files, in fasta format, and returns
##      masked sequence file(s) (repetitive DNA is masked) for database

However, it still reported an error.

(base) [liyulong@node1 RepeatMasker-4.1.1]$ ./RepeatMasker -h
Can't locate Text/Soundex.pm in @INC (you may need to install the Text::Soundex module) (@INC contains: ~/software/RepeatMasker-4.1.1 ~/software/perl-5.32.1/lib/site_perl/5.32.1/x86_64-linux ~/software/perl-5.32.1/lib/site_perl/5.32.1 ~/software/perl-5.32.1/lib/5.32.1/x86_64-linux ~/software/perl-5.32.1/lib/5.32.1) at ~/software/RepeatMasker-4.1.1/Taxonomy.pm line 72.
BEGIN failed--compilation aborted at ~/software/RepeatMasker-4.1.1/Taxonomy.pm line 72.
Compilation failed in require at ./RepeatMasker line 333.
BEGIN failed--compilation aborted at ./RepeatMasker line 333.

And I installed Text::Soundex module through CPAN shell:

(base) [liyulong@node1 RepeatMasker-4.1.1]$ perl -MCPAN -e shell
cpan shell -- CPAN exploration and modules installation (v2.27)
Enter 'h' for help.

Can't ioctl TIOCGETP: Invalid argument
Consider installing Term::ReadKey from CPAN site nearby
        at http://www.perl.com/CPAN
Or use
        perl -MCPAN -e shell
to reach CPAN. Falling back to 'stty'.
        If you do not want to see this warning, set PERL_READLINE_NOWARN
in your environment.
cpan[1]> install Text::Soundex
Reading '/public2/users/liyulong/.cpan/Metadata'
  Database was generated on Thu, 28 Jan 2021 10:55:47 GMT
Text::Soundex is up to date (3.05).

cpan[1]>

After I successfully installed the Text::Soundex, however, I got a different error message.

(base) [liyulong@node1 RepeatMasker-4.1.1]$ ./RepeatMasker -h
Unmatched right curly bracket at ./RepeatMasker line 1434, at end of line
syntax error at ./RepeatMasker line 1434, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1476, near "my"
Unmatched right curly bracket at ./RepeatMasker line 1861, at end of line
syntax error at ./RepeatMasker line 1861, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1893, near "my"
syntax error at ./RepeatMasker line 2056, near "}"
syntax error at ./RepeatMasker line 2430, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 2442, near "my"
syntax error at ./RepeatMasker line 2461, near "}"
./RepeatMasker has too many errors.

So I tried to unpack the RepeatMasker-4.1.1.tar.gz file again, and run:

(base) [liyulong@node1 RepeatMasker-4.1.1]$ ./RepeatMasker -h
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r"
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r"
Compilation failed in require at ./RepeatMasker line 333.

And when I replaced the first line of RepeatMasker file:

#!/public2/users/liyulong/software/perl-5.32.1/bin/perl
##---------------------------------------------------------------------------##
##  File:
##      @(#) RepeatMasker
##  Author:
##      Arian Smit <asmit@systemsbiology.org>
##      Robert Hubley <rhubley@systemsbiology.org>
##  Description:
##      Takes one or more DNA sequence files, in fasta format, and returns
##      masked sequence file(s) (repetitive DNA is masked) for database
##      searches and a file with detailed annotation of repeat locations.The
##      sequence data are screened against a library of repetitive sequences
##      using the program cross_match (Phil Green, unpublished) or
##      ABBlast ( Gish et al ), RMBlast ( NCBI, Hubley et al ), or
##      nhmmer ( Wheeler et al ).
##
## NOTE: See RepeatMaskerConfig.pm for necessary installation
##       customization.
##
#******************************************************************************

then, I got the same error message:

(base) [liyulong@node1 RepeatMasker-4.1.1]$ ./RepeatMasker -h
Unmatched right curly bracket at ./RepeatMasker line 1434, at end of line
syntax error at ./RepeatMasker line 1434, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1476, near "my"
Unmatched right curly bracket at ./RepeatMasker line 1861, at end of line
syntax error at ./RepeatMasker line 1861, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1893, near "my"
syntax error at ./RepeatMasker line 2056, near "}"
syntax error at ./RepeatMasker line 2430, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 2442, near "my"
syntax error at ./RepeatMasker line 2461, near "}"
./RepeatMasker has too many errors.
jebrosen commented 3 years ago

Thanks for your reply. In my earlier response, I tested the version of Perl used in my local account. (...) However, this may not be appropriate, because the RepeatMasker file is calling the Perl of the root account by default (/usr/bin/perl).

The path in RepeatMasker's #! line is set when you run the configure program, so if configure was never run or was run outside of your perl 5.26.2 conda environment that would explain the discrepancy.

It turns out the the /r regex flag was added in perl 5.14.0, so this can be reproduced quite easily:

$ docker run -it --rm perl:5.10.1 perl -e 'my $a = "a"; print($a =~ s/a/b/r);'
Bareword found where operator expected at -e line 1, near "s/a/b/r"

This was straightforward to work around, and to my knowledge this error has now been fixed for the next RepeatMasker release. I sincerely apologize for this inconvenience, since we do claim compatibility with perl 5.8.0 and I didn't even consider that s///r was a newer feature.


I installed the version 5.32.1 of Perl in my account, and replaced the first line of RepeatMasker file. (...) However, it still reported an error. (...) And I installed Text::Soundex module through CPAN shell: (...)

Sorry also about this confusion. This is issue #81, which has also been fixed for the next release. We recently stopped using Text::Soundex and removed it from the installation requirements list, but the use Text::Soundex line was left in by mistake.

Incidentally, we recommend using the configure program if possible instead of editing files by hand. There are other script files that need to be edited, and configure does all of them at once in addition to setting up paths to dependencies such as search engines.


After I successfully installed the Text::Soundex, however, I got a different error message. (...) So I tried to unpack the RepeatMasker-4.1.1.tar.gz file again, and run: (...) And when I replaced the first line of RepeatMasker file (...) then, I got the same error message:

(base) [liyulong@node1 RepeatMasker-4.1.1]$ ./RepeatMasker -h
Unmatched right curly bracket at ./RepeatMasker line 1434, at end of line
syntax error at ./RepeatMasker line 1434, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1476, near "my"
Unmatched right curly bracket at ./RepeatMasker line 1861, at end of line
syntax error at ./RepeatMasker line 1861, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 1893, near "my"
syntax error at ./RepeatMasker line 2056, near "}"
syntax error at ./RepeatMasker line 2430, near "}"
Can't redeclare "my" in "my" at ./RepeatMasker line 2442, near "my"
syntax error at ./RepeatMasker line 2461, near "}"
./RepeatMasker has too many errors.

This problem is very perplexing and I don't remember ever seeing anything like it before. The error says Unmatched right curly bracket at ./RepeatMasker line 1434, at end of line, but the } is on line 1435, not 1434. The line numbers in the other error messages are also off by one. I myself do not see this same error on several machines, with several different perl versions.

However I do notice one strange thing: on several lines, including 1366 (the line before the { that matches the } on 1434), there is a middle-dot (·) at the end of the line. I do not know exactly why they would cause issues only on some machines, but it could explain several of the errors you encountered.

Would you be willing to try the attached copy of the RepeatMasker program, in which I have removed those characters? You will have to rename it to remove the .txt extension. The overall program still might not work if the same character is also in other files, but knowing whether or not this fixes any errors will help us immensely in fixing it more completely.

RepeatMasker.txt


Thank you so much for your detailed reports and for your patience through troubleshooting these issues.

Tiramisu023 commented 3 years ago

Would you be willing to try the attached copy of the RepeatMasker program, in which I have removed those characters? You will have to rename it to remove the .txt extension. The overall program still might not work if the same character is also in other files, but knowing whether or not this fixes any errors will help us immensely in fixing it more completely.

RepeatMasker.txt

After I replaced the RepeatMasker file in 4.1.1 with the renamed RepeatMasker.txt file, I could start RepeatMasker, and use ncbi engine.

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e ncbi -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
RepeatMasker version 4.1.1
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]

Using Master RepeatMasker Database: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/RepeatMaskerLib.h5
  Title    : Dfam
  Version  : 3.2
  Date     : 2020-07-02
  Families : 6,953

Species/Taxa Search:
  Arabidopsis [NCBI Taxonomy ID: 3701]
  Lineage: root;cellular organisms;Eukaryota;Viridiplantae;
           Streptophyta;Streptophytina;Embryophyta;Tracheophyta;
           Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;
           eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids
  9 families in ancestor taxa; 0 lineage-specific families

Building general libraries in: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/CONS-Dfam_3.2/general
Building species libraries in: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/CONS-Dfam_3.2/arabidopsis

analyzing file Athaliana_167_TAIR9.fa

However, when I set the abblast engine, it couldn't determine engine variant and version.

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
RepeatMasker version 4.1.1
WUBlastSearchEngine::setPathToEngine( /public2/users/liyulong/software/ab-blast-20200317-linux-x64/blastp ): Cannot determine engine variant and version!
 at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 518.

So I modified the WUBlastSearchEngine.pm file (refer to issue #94).

Change line 287 of WUBlastSearchEngine.pm by adding an m: Old line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/ ) { New line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/m ) {

And I got this error information:

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". "engine"
  (Might be a runaway multi-line "" string starting on line 287)
        (Missing operator before engine?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near "if ( !defined $engine || !-f ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". ""
        (Missing semicolon on previous line?)
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near ". "is"
  (Might be a runaway multi-line "" string starting on line 288)
        (Missing operator before is?)
Backslash found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near "$engine\"
        (Missing operator before \?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near "my $parameters    = ""
        (Missing semicolon on previous line?)
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 287, near ". "::setPathToEngine( $value )"
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288.
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289.
Global symbol "$spanParameter" requires explicit package name (did you forget to declare "my $spanParameter"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 302.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$parameters" requires explicit package name (did you forget to declare "my $parameters"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 312.
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 314, near "}"
/public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm has too many errors.
Compilation failed in require at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 335.
BEGIN failed--compilation aborted at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 335.

And then, to check if it was a configuration issue, I deleted the RepeatMasker-4.1.1 fold, and unpacked the original gz file, and configure the software.

The path in RepeatMasker's #! line is set when you run the configure program, so if configure was never run or was run outside of your perl 5.26.2 conda environment that would explain the discrepancy.

I tried to run configure, however it didn't let me set the Perl or Python installation path. Maybe it automatically checked and skiped these two steps? or is there somethind I missed?

Here is the configuration flow.

$ ./configure

Checking for libaraies...
Rebuilding RepeatMaskerLib.h5 master library
     - Read in 49011 sequences from .../RMBSeqs.embl
     - Read in 49011 annotations from .../RMRBMeta.embl
     Merging Dfam + RepBase into RepeatMaskerLib.h5 library ...........................................................................................
..........................................................

the full path including the name for the TRF program.
TRF_PRGM: /public2/users/liyulong/software/trf-4.0.9/trf

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Un-configured ]
   3. HMMER3.1 & DFAM: [ Un-configured]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 2

The path to the installation of the RMBLAST sequence alignment program.
RMBLAST_DIR: /public2/users/liyulong/software/rmblast-2.10.0/bin

Enter Selection: 3

The path to the HMMER profile HMM search software.
HMMER_DIR: /public2/users/liyulong/software/hmmer-3.3.2/bin

Enter Selection: 4

The path to the installationg of the ABBLAST sequence alignment program.
ABBLAST_DIR: /public2/users/liyulong/software/ab-blast-20200317-linux-x64

Enter Selection: 5

Building FASTA version of RepeatMasker.lib ................................
Building RMBlast frozen libraries..
Building WUBlast/ABBlast frozen libraries..
The program is installed with a the following repeat libraries:
Database: Dfam withRBRM
Version: 3.2
Date: 2020-07-02

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 318520
Total HMMs: 273655

Further documentation on the program may be found here:
  /public2/users/liyulong/software/RepeatMasker/repeatmasker.help

After configuration was finished, I tried to start RepeatMasker with the orginal RepeatMasker file. It worked!!! So amazing, I didn't even replace it with a new RepeatMasker.txt file. The previous failure to start was not repeated,

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e ncbi -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
RepeatMasker version 4.1.1
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]

Using Master RepeatMasker Database: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/RepeatMaskerLib.h5
  Title    : Dfam withRBRM
  Version  : 3.2
  Date     : 2020-07-02
  Families : 318,520

Species/Taxa Search:
  Arabidopsis [NCBI Taxonomy ID: 3701]
  Lineage: root;cellular organisms;Eukaryota;Viridiplantae;
           Streptophyta;Streptophytina;Embryophyta;Tracheophyta;
           Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;
           eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids
  36 families in ancestor taxa; 958 lineage-specific families

Building general libraries in: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/CONS-Dfam_withRBRM_3.2/general
Building species libraries in: /public2/users/liyulong/software/RepeatMasker-4.1.1/Libraries/CONS-Dfam_withRBRM_3.2/arabidopsis

analyzing file Athaliana_167_TAIR9.fa

And then, I tried to use the abblast engine (still using the original RepeatMasker file).

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
RepeatMasker version 4.1.1
WUBlastSearchEngine::setPathToEngine( /public2/users/liyulong/software/ab-blast-20200317-linux-x64/blastp ): Cannot determine engine variant and version!
 at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 518.

Then I modified the WUBlastSearchEngine.pm file (refer to issue #94).

Change line 287 of WUBlastSearchEngine.pm by adding an m: Old line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/ ) { New line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/m ) {

And I got the repeated error.

RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". "engine"
  (Might be a runaway multi-line "" string starting on line 287)
        (Missing operator before engine?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near "if ( !defined $engine || !-f ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". ""
        (Missing semicolon on previous line?)
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near ". "is"
  (Might be a runaway multi-line "" string starting on line 288)
        (Missing operator before is?)
Backslash found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near "$engine\"
        (Missing operator before \?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289, near "my $parameters    = ""
        (Missing semicolon on previous line?)
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 287, near ". "::setPathToEngine( $value )"
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288.
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289.
Global symbol "$spanParameter" requires explicit package name (did you forget to declare "my $spanParameter"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 302.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$parameters" requires explicit package name (did you forget to declare "my $parameters"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 312.
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 314, near "}"
/public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm has too many errors.
Compilation failed in require at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 335.
BEGIN failed--compilation aborted at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 335.

I replaced the old RepeatMasker file with the new one. I tested the new RepeatMasker file with the modifed WUBlastSearchEngine.pm file, and got (This error information was different with before, because I re-unpacked the gz file.):

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r"
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r "
Compilation failed in require at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.
BEGIN failed--compilation aborted at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.

I tested the new RepeatMasker file, with the original WUBlastSearchEngine.pm file, and got the same error. This may means when I use the new RepeatMasker file, there is no "Cannot determine engine variant and version!" error any more?

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ RepeatMasker -pa 8 -e abblast -species "arabidopsis" -poly -html -gff -dir Repeat_results Athaliana_167_TAIR9.fa
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r"
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/Taxonomy.pm line 376, near "s/'/'"'"'/r "
Compilation failed in require at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.
BEGIN failed--compilation aborted at /public2/users/liyulong/software/RepeatMasker-4.1.1/RepeatMasker line 333.

To summarize, above tests were performed in the following environments:

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ which perl
~/software/perl-5.32.1/perl
(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ perl --version

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux

Copyright 1987-2021, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ which python
~/software/anaconda3/envs/circos/bin/python
(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$ python --version
Python 3.8.6
(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$

I am currently in the base environment of Anaconda:

(base) [liyulong@node1 00.RepeatMasker4.1.1_bug]$

Do I need to get out of the Anaconda base environment ( [liyulong@node1 00.RepeatMasker4.1.1_bug]$ ), and use a seperate python (~/software/Python3/python) ?

But it seems that with Python installed manually, there are a large number of dependent packages that need to be installed manually. On the other hand, conda's virtual environment doesn't seem to matter, it's just a matter of environment variable.

jebrosen commented 3 years ago

At this point I think there are too many different environments involved and we should "start over" from your latest comment. I have given several independent solutions to different issues affecting different versions of perl and RepeatMasker, but I think that is now confusing both of us.


Let's try to troubleshoot the environment you have the most control over: your base conda env with perl 5.32.1 and python 3.8.6. Here are two answers to your questions that may help:

I tried to run configure, however it didn't let me set the Perl or Python installation path. Maybe it automatically checked and skiped these two steps? or is there somethind I missed?

Do I need to get out of the Anaconda base environment ( [liyulong@node1 00.RepeatMasker4.1.1_bug]$ ), and use a seperate python (~/software/Python3/python) ? But it seems that with Python installed manually, there are a large number of dependent packages that need to be installed manually. On the other hand, conda's virtual environment doesn't seem to matter, it's just a matter of environment variable.

The only python dependencies for RepeatMasker 4.1.1 are h5py and its own dependencies; not "a large number" as far as I am aware. These can all be installed at once through conda install h5py (or, for example, through pip install --user h5py if you are not using conda).


The steps in your previous comment include a lot of switching between "old" and "new" copies of files and it's not clear when you did or didn't re-run configure. Can you try the steps again in this order without making any other changes, to best determine which error(s) still need to be resolved?

  1. Unpack the original RepeatMasker 4.1.1 files.
  2. Apply the fix for #94 by adding the m.
  3. Run configure and then try running RepeatMasker.
  4. You should not see the "s/'/'"'"'/r " error at this point because of the recent perl version. If you do see this error, triple-check that configure was run after activating your desired conda environment.
  5. If you see the Unmatched right curly bracket at ./RepeatMasker line 1434, at end of line error, replace RepeatMasker with the RepeatMasker.txt file I provided earlier, re-run configure, and try again.
  6. At this point, both ncbi and abblast engines should work without error.

You have also mentioned Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288, near ". "engine". That line number is 6 lines earlier than the line that has ". "engine", and the only reason I can think of so far is a mistake in editing that file.

Let me know if this helps!

Tiramisu023 commented 3 years ago

Can you try the steps again in this order without making any other changes, to best determine which error(s) still need to be resolved?

  • Unpack the original RepeatMasker 4.1.1 files.
  • Apply the fix for #94 by adding the m.

I unpacked the original RepeatMasker 4.1.1 files and added the 'm' in WUBlastSearchEngine.pm. But when I run the "./configure", I got an error message.

RepeatMasker Configuration Program
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngi                            ne.pm line 288, near ". "engine"
  (Might be a runaway multi-line "" string starting on line 287)
        (Missing operator before engine?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine                            .pm line 288, near "if ( !defined $engine || !-f ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine                            .pm line 288, near ". ""
        (Missing semicolon on previous line?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine                            .pm line 288, near ". ""
        (Missing semicolon on previous line?)
Bareword found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngi                            ne.pm line 289, near ". "is"
  (Might be a runaway multi-line "" string starting on line 288)
        (Missing operator before is?)
Backslash found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEng                            ine.pm line 289, near "$engine\"
        (Missing operator before \?)
String found where operator expected at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine                            .pm line 289, near "my $parameters    = ""
        (Missing semicolon on previous line?)
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 287, near ". ":                            :setPathToEngine( $value )"
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/us                            ers/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 288.
Global symbol "$engine" requires explicit package name (did you forget to declare "my $engine"?) at /public2/us                            ers/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 289.
Global symbol "$spanParameter" requires explicit package name (did you forget to declare "my $spanParameter"?)                             at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 302.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/user                            s/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/user                            s/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/user                            s/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 306.
Global symbol "$parameters" requires explicit package name (did you forget to declare "my $parameters"?) at /pu                            blic2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/user                            s/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 307.
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?) at /public2/user                            s/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 312.
syntax error at /public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm line 314, near "}"
/public2/users/liyulong/software/RepeatMasker-4.1.1/WUBlastSearchEngine.pm has too many errors.
Compilation failed in require at ./configure line 274.
jebrosen commented 3 years ago

Can you upload the WUBlastSearchEngine.pm file here? And what program are you using to edit it?

It's very strange that adding that m worked for you in #94 and on a few computers I tried, but is causing errors now.

Tiramisu023 commented 3 years ago

I'm so sorry, I just misunderstood #94, and modified WUBlastSearchEngine.pm file incorrectly.

Old line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/ ) { New line: if ( $result =~ /^(BLAST[PN]) (\S+ [.])/m ) {

There was a gap between "/" and "m" in my previous modified WUBlastSearchEngine.pm file. if ( $result =~ /^(BLAST[PN]) (\S+ \[.*\])/ m) {

Please let me restart the test with the correctly modified WUBlastSearchEngine.pm file.

  1. Unpack the original RepeatMasker 4.1.1 files.

Done.

  1. Apply the fix for #94 by adding the m.

Correctly modified.

  1. Run configure and then try running RepeatMasker. You should not see the "s/'/'"'"'/r " error at this point because of the recent perl version. If you do see this error, triple-check that configure was run after activating your desired conda environment.

I run configure with the original RepeatMasker file. And I can run my data successfully now. I'm using the python and perl under the conda base environment.

Every time I connect to a terminal, I always automatically enter the conda base environment. (base) [liyulong@node1 ~]$

I checked the $PATH of conda base environment and deactivate base enviroment [liyulong@node1 ~]$, they are exactly the same. Therefore, there is no need to deactivate the conda environment.

I just misunderstood your words, very sorry for the trouble caused to you.

May I ask if I can delete my previous reply? It is because of my personal wrong operation, it might make other people confused.

Many thanks for your help.

jebrosen commented 3 years ago

I'm so sorry, I just misunderstood #94, and modified WUBlastSearchEngine.pm file incorrectly. (...) I run configure with the original RepeatMasker file. And I can run my data successfully now. I'm using the python and perl under the conda base environment. I just misunderstood your words, very sorry for the trouble caused to you. Many thanks for your help.

That is a relief! Thank you again for reporting these issues to us and for testing the proposed fixes, and I am happy to hear that you were able to get it working.

May I ask if I can delete my previous reply? It is because of my personal wrong operation, it might make other people confused.

Please feel free to edit or delete those replies however you see fit.


To my knowledge, all the bugs that have been found in this process have been fixed in our development branch and the fixes will be incorporated in the next RepeatMasker release.

Tiramisu023 commented 3 years ago

Thanks again.

Many of the previous replies were related to your reply. If I delete them, they may be lack of logic.

So let's keep them. ^_^

jebrosen commented 3 years ago

RepeatMasker 4.1.2 has been released, fixing several errors referenced in this thread.

Thanks again for calling our attention to this problem!

Blosberg commented 2 years ago

Thanks for providing this software. I'm having the same issue as OP and ended up here:

RepeatMasker 4.1.2 has been released

I'd like to switch to this version, but the sidebar here at github shows "No releases published", and at the following site: http://www.repeatmasker.org/RMDownload.html , the link to RepeatMasker-4.1.1.tar.gz is still working, but the link to RepeatMasker-4.1.2-p1.tar.gz leads to

404 Not found: 
The requested URL /RepeatMasker-4.1.2-p1.tar.gz was not found on this server.

Can you provide more info on how to download version 4.1.2 to eliminate the "Bareword found where operator expected" error? Thanks for patching this.

jebrosen commented 2 years ago

@Blosberg http://www.repeatmasker.org/RMDownload.html is a previous location for the download page; the current download page http://www.repeatmasker.org/RepeatMasker/ has the correct download links. I will look into why the old page RMDownload.html has new versions of RepeatMasker with broken links.

Blosberg commented 2 years ago

Hi @jebrosen , thanks for the new link, that's helpful. I'm still encountering issues elsewhere in installation though (unless I'm going to the wrong places) --e.g. for cross_match http://www.phrap.org --> http://www.phrap.org/phredphrapconsed.html --> http://www.phrap.org/consed/consed.html#howToGet --> http://www.phrap.org/consed/distributions/29.0/consed_linux.tar.gz --> "Forbidden; You don't have permission to access this resource."

jebrosen commented 2 years ago

"Forbidden; You don't have permission to access this resource."

cross_match requiress acceptance of the academic user agreement or a commercial license (described at http://www.phrap.org/consed/consed.html#howToGet ) before download. As an alternative to cross_match, RepeatMasker also supports two open-source search engines (RMBlast and HMMER).

Blosberg commented 2 years ago

Understood. I tracked down our version and the rest of the installation proceeded fine. The original error cited at the top of the thread is gone now. Thanks for your help!

jebrosen commented 2 years ago

@Blosberg That's great! I am glad to hear that things are working now and that I was able to help.