gem-pasteur / macsyfinder

MacSyFinder - Detection of macromolecular systems in protein datasets using systems modelling and similarity search.
GNU General Public License v3.0
51 stars 17 forks source link

MacSyFinder banner

MacSyFinder

Build Status codecov PyPI - Python Version Open Source License: GPL v3 Doc PyPI Docker Image Version (latest semver) Conda SWH SWH CII Best Practices FAIR checklist badge

MacSyFinder - Detection of macromolecular systems in protein datasets using systems modelling and similarity search.

Citations

MacSyFinder v2: Néron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S. MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes. Peer Community Journal, Volume 3 (2023), article no. e28. doi : 10.24072/pcjournal.250. https://peercommunityjournal.org/articles/10.24072/pcjournal.250/

MacSyFinder v1: Abby SS, Néron B, Ménager H, Touchon M, Rocha EPC (2014). MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. PLoS ONE 9(10): e110726. doi:10.1371/journal.pone.0110726 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0110726

What new in MacSyFinder V2.x

https://macsyfinder.readthedocs.io/en/latest/user_guide/new_v2.html

Installation

[!IMPORTANT] MacSyFinder requires hmmer >= 3.1 (http://hmmer.org/). You need to install hmmer by yourself (except if you install macsyfinder via conda/mamba). If you are a modeler, you will need also git The other dependencies are managed by the python package manager pip.

MacSyFinder is available on pypi

PyPI

Installation from distribution

We encourage to install macsyfinder in a virtualenv

After creating a virtualenv dedicated to macsyfinder and activating it

python3 -m venv my_project
cd my_project
source bin/activate

you can install macsyfinder as described below:

from pypi

python3 -m pip install macsyfinder==x.x

where x.x is the version number

from conda/mamba

mamba install -c bioconda macsyfinder=x.x

where x.x is the version number

from git repository

git clone https://github.com/gem-pasteur/macsyfinder.git
cd macsyfinder
python3 -m pip install .

for modelers

https://macsyfinder.readthedocs.io/en/latest/modeler_guide/installation.html

for developers

https://macsyfinder.readthedocs.io/en/latest/developer_guide/installation.html

Unit tests

python3 setup.py test

or

python3 tests/run_tests.py -vv

or to run a specific test

python3 tests/run_tests.py -vv tests/test_xxx.py

with github actions / coverage / codecov

Build Status codecov

Models installation

Models are no longer shipped along macsyfinder package. To install Models you can use macsydata. macsydata allow to manage models stored in macsy-models. Below some most useful commands.

For complete documentation see macsydata section on readthedoc

For models not stored in macsy-models the commands available, search, installation from remote or upgrade from remote are NOT available.

For models Not stored in macsy-models, you have to manage them semi-manually. Download the archive (do not unarchive it), then use macsydata for the installation.

Documentation

You will find complete documentation for setting up your project on readthedocs

Doc

Example data sets

Two example datasets with command lines and expected output files are available here and here (for a more thorough one). The 1st dataset is also described in the Documentation.

Docker

MacSyFinder is also available as Docker container

How to use macsyfinder container with docker

The computations are performed under msf user in /home/msf inside the container. So You have to mount a directory from the host in the container to exchange data (inputs data, and results) from the host and the container. The shared directory must be writable by the msf user or overwrite the user in the container by your id (see example below)

Furthermore the models are no longer packaged along macsyfinder. So you have to install them by yourself. For that we provide a command line tool macsydata which is inspired by pip

macsydata search PACKNAME
macsydata install PACKNAME== or >=, or ... VERSION

To work with Docker you have to install models in a directory which will be mounted in the image at run time

mkdir shared_dir
cd shared_dir
# install desired models in my_models
docker run -v ${PWD}/:/home/msf -u $(id -u ${USER}):$(id -g ${USER})  gempasteur/macsyfinder:<tag> macsydata install --target /home/msf/my_models MODELS
# run msf with these models
docker run -v ${PWD}/:/home/msf -u $(id -u ${USER}):$(id -g ${USER})  gempasteur/macsyfinder:<tag> --db-type gembase --models-dir=/home/msf/my_models/ --models  TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --sequence-db my_genome.fasta -w 12

How to use with apptainer (formely Singularity)

As the docker image is registered in docker hub you can also use it directly with apptainer. Unlike docker you have not to worry about shared directory, your home and /tmp are automatically shared.

apptainer run -H ${HOME} docker://gempasteur/macsyfinder:<tag> macsydata install --target my_models MODELS
apptainer run -H ${HOME} docker://gempasteur/macsyfinder:<tag> macsyfinder --db-type gembase --models-dir=my_models --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --sequence-db my_genome.fasta -w 12

Licence:

MacSyFinder is developed and released under Open Source License: GPL v3

Contributing

We encourage contributions, bug report, enhancement ...

But before to do that, we encourage to read the contributing guide.

Contributors

List of all people who participated in the macsyfinder project.

Note

The setsid binary in utils directory is used only for functional tests on macosx. The binary has been build using the setsid-macosx project.