Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
227 stars 50 forks source link

Error Write `RepeatMaskerLib.embl` file #19

Closed Nilad closed 3 years ago

Nilad commented 5 years ago

Hi,

First, thanks for the maintenance of this tools.

I try to launch and work with RepeatMasker by a Singularity image.

I used RepeatScout before RepeatMasker (without issue :+1: )

But i have this problem:

Issue

RepeatMasker version open-4.0.8
Search Engine: NCBI/RMBLAST [ 2.2.27+ ]
Rebuilding RepeatMaskerLib.embl library
  - Read in 312 sequences from /usr/local/RepeatMasker/Libraries/DfamConsensus.embl
  Saving RepeatMaskerLib.embl library...() Unable to open file /usr/local/RepeatMasker/Libraries/RepeatMaskerLib.embl for writing: Read-only file system

Command line

singularity exec RepeatMasker.simg RepeatMasker -lib repeatscout.filtered myFasta.fasta

rmhubley commented 5 years ago

Looks like you are using a non-standard installation of RepeatMasker ( image/container/wrapper ). It's really difficult for us to manage other developer's installations/modifications to our software. I usually recommend directing these kinds of requests directly to the individual(s) who packaged and distributed the software you are running.

The error is indicating that the installation of RepeatMasker is not complete. The configure script was never run and therefore the first invocation of RepeatMasker itself is trying to setup the RepeatMasker/Libraries directory. In this case you don't have write privileges to that directory. Once this is configured ( usually by an administrator if it's a system-wide installation ), this directory does not need to be written to anymore. RepeatMasker does create cached library files for each species at runtime, however once configured, if it can't write to the RepeatMasker/Libraries directory it will save it's cached files to the users ~/.RepeatMaskerCache directory. I would contact the author of the singularity package and request that they update the image by running configure before creating it.

A current downside to creating images using RepeatMasker is that for the time being the largest database of TEs is a closed ( license restricted ) one - RepBase. If you package up RepeatMasker as you must, without this database included, you will be requiring the user to finish the installation at a later time. They will have to download the library from GIRI and re-run the configure script. These actions modify the installation directory and will not work with static (read-only) installations.

Nilad commented 5 years ago

I build myself the singularity image with the reference of this dockerhub page (mainly re-use by others users) https://hub.docker.com/r/robsyme/repeatmasker-onbuild

The configuration by perl ./configure is problematic because it's need a user interaction. If this step can not be bypassed, how can i run this configuration without interaction ?

I just want use RepeatMasker with RepeatScout on my own data and dont use other reference librairies.

rmhubley commented 5 years ago

In the new version of RepeatMasker 4.0.9 the configure script now supports command-line parameters for all options. It still needs to be run in order to setup the Libraries/ directory even if you only plan to use custom libraries. That might work for you. We are starting to evaluate containerization/packaging technologies to add support for this type of installation method.

As for RepeatScout, just a word of caution. RepeatScout is tuned in such a way that it excels at finding young (less diverged) repeat families. I would recommend using it in tandem with RECON ( as we do in RepeatModeler ) to round out the range of families identified. Also, we have a new version of RepeatScout in development which can process genome-size samples, supports an affine gap model and custom scoring matrices for improved sensitivity. We hope to get that out this year.

nathanweeks commented 5 years ago

Using the Biocontainers RepeatMasker image, one can create a subdirectory on the host at some path that is bind mounted in the container (e.g., the current working directory) containing symbolic links to the files in the RepeatMasker/Libraries directory (in the container), and set the (Bioconda/Biocontainers-RepeatMasker-specific) REPEATMASKER_LIB_DIR environment variable to this directory.

e.g.:

$ mkdir repeatmasker-libraries
$ singularity pull repeatmasker:4.0.9_p2--pl526_0.sif  docker://quay.io/biocontainers/repeatmasker:4.0.9_p2--pl526_0.sif
...
$ singularity exec repeatmasker:4.0.9_p2--pl526_0.sif sh -c 'ln -s /usr/local/share/RepeatMasker/Libraries/* repeatmasker-libraries/'
$ ls -l repeatmasker-libraries/
total 0
lrwxrwxrwx 1 user group 54 May 10 11:51 Artefacts.embl -> /usr/local/share/RepeatMasker/Libraries/Artefacts.embl
lrwxrwxrwx 1 user group 49 May 10 11:51 Dfam.embl -> /usr/local/share/RepeatMasker/Libraries/Dfam.embl
lrwxrwxrwx 1 user group 48 May 10 11:51 Dfam.hmm -> /usr/local/share/RepeatMasker/Libraries/Dfam.hmm
...
$ REPEATMASKER_LIB_DIR=$PWD/repeatmasker-libraries singularity exec repeatmasker:4.0.9_p2--pl526_0.sif RepeatMasker -species human hsap_contig.fasta
RepeatMasker version open-4.0.9
Search Engine: NCBI/RMBLAST [ 2.6.0+ ]
Rebuilding RepeatMaskerLib.embl master library
  - Read in 9 sequences from /scratch/nweeks/maker/data/repeatmasker-libraries/Artefacts.embl
  - Read in 6235 sequences from /scratch/nweeks/maker/data/repeatmasker-libraries/Dfam.embl
RepeatMaskerLib.embl: 6244 total sequences.
Building FASTA version...Master RepeatMasker Database: /scratch/nweeks/maker/data/repeatmasker-libraries/RepeatMaskerLib.embl ( Complete Database: CONS-Dfam_3.0 )
...
Generating output...
masking
done
$ ls -l repeatmasker-libraries/RepeatMaskerLib.embl
-rw-r--r-- 1 user group 20552410 May 10 11:54 repeatmasker-libraries/RepeatMaskerLib.embl
jebrosen commented 3 years ago

It looks like this issue has been fixed in a more recent version of RepeatMasker and can be closed:

If you encounter problems with these options, please file a new issue.

aditi17142 commented 3 years ago

Hi!! I am implementing RepeatMasker4.1.1 While using the tool, I came across an error

Command: ./RepeatMasker -species fungi /home/guest1/assembly.fasta -dir /home/guest1/maker/RM_output

Output: The assumed RepeatMasker installation directory /home/guest1/maker/RepeatMasker does not appear to be correct. E.g it does not contain a 'Libraries' or 'Matrices' subdirectory. This can occur if hard links are used to invoke this script.

Kindly help. I shall be highly obliged.