bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.65k stars 3.28k forks source link

ERROR: Could not determine if RepBase is installed #16501

Open mictadlo opened 5 years ago

mictadlo commented 5 years ago

Hi @abretaud, @nathanweeks, @johanneskoester, @kastman, @pvanheus, @jerowe, @bgruening and @ArneKr,

I ran Maker but I got the following error:

> qpeek 4702790.pbs
Possible precedence issue with control flow operator at /lustre/work-lustre/waterhouse_team/miniconda2/envs/maker/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
STATUS: Parsing control files...
ERROR: Could not determine if RepBase is installed
--> rank=NA, hostname=cl4n008

Where do I install RepBase with this package?

Thank you in advance,

Michal

xonq commented 5 years ago

I'm getting the same error - does this have to do with RepBase making their software proprietary?

abretaud commented 5 years ago

Hi, The error comes from Maker trying to use the RepBase version bundled with RepeatMasker. However, this RepBase version is not packaged in bioconda, see https://github.com/bioconda/bioconda-recipes/blob/master/recipes/repeatmasker/build.sh#L16 = the one in the default repeatmasker distribution was too old, and we couldn't ship a newer one due to the RepBase license

The solution is to download RepBase manually, and set the REPEATMASKER_LIB_DIR and REPEATMASKER_MATRICES_DIR environment variables.

xonq commented 5 years ago

in what file do we set the variables?

nathanweeks commented 5 years ago

@xonq : you can set the environment variables in your shell (script) before invoking maker; e.g.:

export REPEATMASKER_LIB_DIR=/path/to/my/repeatmasker/lib
xonq commented 5 years ago

can this be resolved by installing a different repeatmasker version? i.e. conda install maker repeatmasker=4.0.7

edit: this does not work

phhsieh1329 commented 5 years ago

Hi, Does anyone know where I can download or retrieve the MATRICES? Thanks!

xonq commented 5 years ago

You have to get a license for the program and install.

phhsieh1329 commented 5 years ago

Hi @abretaud I tried to follow the solutions you provided but I still encounter a similar issue.

Here are what I have tried:

However, when I tried the Maker, it shows:

maker -h Possible precedence issue with control flow operator at /sd/MAKER_py2/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805. MAKER version 2.31.9

Could you help see which step might go wrong? Thank you so much.

gbdias commented 5 years ago

I'm having similar issues. This post by Maker's author suggests that the conda version of Maker might not be properly working. http://gmod.827538.n3.nabble.com/Does-Conda-Maker-actually-work-td4060214.html#a4060215

nathanweeks commented 5 years ago

@gbdias : that post was a while ago; the inline C issue should have been resolved: https://github.com/bioconda/bioconda-recipes/pull/15001

@phhsieh1329: the warning is harmless, and is fixed in newer versions of bioperl: https://github.com/bioperl/bioperl-live/pull/251 (MAKER is pinned to bioperl 1.7.2, since bioperl 1.7.3 removed many modules, which were separated into different distributions)

phhsieh1329 commented 5 years ago

@nathanweeks Thanks for the information. In this case, if you happen to know whether I should download and update the local repeat database with the latest one from RepBase? Thanks.

nathanweeks commented 5 years ago

@phhsieh1329 : I guess it depends on whether or not you need RepBase (and have the $$ to pay for a version). RepeatMasker is bundled with Dfam.

mrmrwinter commented 4 years ago

Hi, i avoided this error by running "$ RepeatMasker ./configure" from within the environment conda installed maker in

ToddUgine commented 4 years ago

Please excuse my lack of knowledge. I'm a total newb. I'm about to run MAKER on a de novo assembly. My institution doesn't have a REPBASE license. Does MAKER call on the REPBASE website? or does the MAKER install include a REPBASE database that it uses to mask repeats? I want to know if I'm doomed to fail without a license before I spent the time and money to engage the super computer that will do the processing.

Thanks, and please ask followup questions. I'll muddle my way through them.

pvanheus commented 4 years ago

MAKER2 requires that you have a license for and install RepBase. It does not install RepBase for you.

BTW at the risk of false positives, the NCBI Eukaryotic Genome Pipeline uses Windowmasker (installed alongside BLAST) as an alternative to RepeatMasker: https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/

ToddUgine commented 4 years ago

Ok, thanks. Now, can I have MAKER call WindowMasker, or do I run Windowmasker on the genome first and then feed it into MAKER? If the former, how do I have MAKER do that?

Yorks0n commented 3 years ago

According to This Reply, MASKER checks xxx/RepeatMasker/Libraries/RepeatMaskerLib.embl for Repbase.

However, RepeatMasker no more creates RepeatMaskerLib.embl, instead it uses Dfam.h5 to create RepeatMaskerLib.h5. Even though you may download old version of Repbase somewhere, (e.g. RMRBSeqs.embl), RM will only add it into RepeatMaskerLib.h5, thus it won't help if you set REPEATMASKER_LIB_DIR. BTW, RM uses LIBDIR instead of REPEATMASKER_LIB_DIR in newer versions.

So there are two ways to solve this:

  1. Just set model_org= empty in maker_opts.ctl, and it won't check if Repbase was installed.
  2. If you have older versions of Repbase like RMRBSeqs.embl, create symlink in xxx/RepeatMasker/Libraries/. For example, ln -s RMRBSeqs.embl RepeatMaskerLib.embl. But in that case, model_org= should be set to the org that exists in the database, instead of all. CAUTION: I don't know whether the result is reliable enough in this way.

Last but not least, for the sake that Repbase now provides repeat_db in fasta format, if you have newer version of db, just provide it by setting rmlib=xxx.fa in MASKER config.

marshals1999 commented 3 years ago

I also met this problem, the problem can be resolved. Firstly, you need use the command line $which -a RepeatMasker, if the information show that ~/anaconda3/bin/RepeatMasker, this may be the source of the problem. Y'd better install RepeatMasker software manually. Meanwhile, you need download the Repbase database and decompress it in the RepeatMasker working directory in order to update the library files. Finally, you need change the RepeatMasker software path in the file of maker_exe.ctl. Then maker will be working correctly.

CrawlingSponge commented 3 years ago

replace the repeatmasker 2018 into 2017, also their repbases. problem solved. by the way, it seems that maker3 performed more accuracy then maker2

kongshuang-cn commented 2 years ago

This error comes from line 4363 in GI.pm in maker library where maker is trying to get the library path from the absolute path of RepeatMasker software. Here is the code in GI.pm my $exe = Cwd::abs_path($CTL_OPT{RepeatMasker}); my ($lib) = $exe =~ /(.*\/)RepeatMasker$/; $lib .= "Libraries/RepeatMaskerLib.embl"; Maker will breakdown and error is printed if $lib is empty. And I found RepeatMaskerLib.embl is missing from my repeatmasker library directory. So, as @Yorks0n said, you can set model_org equal to empty or create a softlink, and i think it should be works if you change the source code to let maker find RMRBSeqs.embl instead of RepeatMaskerLib.embl. $lib .= "Libraries/RMRBSeqs.embl";

Jokendo-collab commented 1 year ago

According to This Reply, MASKER checks xxx/RepeatMasker/Libraries/RepeatMaskerLib.embl for Repbase.

However, RepeatMasker no more creates RepeatMaskerLib.embl, instead it uses Dfam.h5 to create RepeatMaskerLib.h5. Even though you may download old version of Repbase somewhere, (e.g. RMRBSeqs.embl), RM will only add it into RepeatMaskerLib.h5, thus it won't help if you set REPEATMASKER_LIB_DIR. BTW, RM uses LIBDIR instead of REPEATMASKER_LIB_DIR in newer versions.

So there are two ways to solve this:

  1. Just set model_org= empty in maker_opts.ctl, and it won't check if Repbase was installed.
  2. If you have older versions of Repbase like RMRBSeqs.embl, create symlink in xxx/RepeatMasker/Libraries/. For example, ln -s RMRBSeqs.embl RepeatMaskerLib.embl. But in that case, model_org= should be set to the org that exists in the database, instead of all. CAUTION: I don't know whether the result is reliable enough in this way.

Last but not least, for the sake that Repbase now provides repeat_db in fasta format, if you have newer version of db, just provide it by setting rmlib=xxx.fa in MASKER config.

This worked for me

slsy9965 commented 1 year ago

Hi, I'm fairly new to bioinformatics and is currently trying to use MAKER to annotate my assembly.

I've currently installed MAKER v3.01.03 using bioconda and so far everything runs smoothly following this tutorial with model_org= set to empty.

Please correct me if I'm wrong but setting model_org= to empty would mean that the entire step of repeat masking would be skipped, yes? That is what is written within the maker_opts.ctl file, but I would like to run repeat masking.

I don't have an older version of RepBase either so the symlink method doesn't seem to apply to me. I don't have a subscription for it either.

I've seen that Dfam can be used instead in #26529 but I haven't been able to find a method to instruct MAKER to use Dfam?

Similar to https://github.com/bioconda/bioconda-recipes/issues/16501#issuecomment-1308307968, I do not have RepeatMaskerLib.embl in my Libraries folder as well.

It seems that manually tweaking the source code https://github.com/bioconda/bioconda-recipes/issues/25559#issuecomment-738756514 here is required for MAKER to recognize RepeatMaskerLib.h5 which is created when RepeatMasker ./configure is ran as I understand it?

Is there any other workaround to resolve this error?

abretaud commented 1 year ago

Hi! I think Dfam is the way to go now rather than the non-free RepBase. You can also try running RepeatModeler to create a repeat library specific to the genome you want to annotate (then give it to RepeatMasker to use the library). Not sure if the code tweaking still works, but it seems like a good option. You can also run RepeatMasker on your own and give the output to maker a pre-masked genome sequence.

slsy9965 commented 1 year ago

Hi not sure if this helps,

Just wanted to update that I've managed to workaround ERROR: Could not determine if RepBase is installed by installing h5py (I think the python version in the env has to be >3.8) with conda in the same environment as MAKER then export LIBDIR=/path/to/conda/environment/share/RepeatMasker/Libraries and queried famdb.py /path/to/the/maker/environment/share/RepeatMasker/famdb.py lineage -d all > ATextFile.txt then specify what I need for model_org= in the maker_opts.ctl file.

So far trying out with the example dataset from MAKER seems to work out with model_org=Alca. Haven't tried it on my own files though