2lambda123 / bartongroup-slivka-bio

Apache License 2.0
0 stars 0 forks source link

slivka-bio

Slivka-bio is a pre-configured instance of a slivka project targeted for bioinformatics. It contains configurations for tools such as Clustal Omega, ClustalW2, Muscle, Mafft and more. The goal is to provide (almost) ready to use package which bundles bioinformatic software in one tool. The applications whose configurations are currently available include:

Downloading slivka-bio

To download slivka-bio, you have three options:

  1. Download the zip file: you can directly download end extract the project as a zip from the following link: slivka-bio v0.8.3.
  2. Clone the git repository:
    git clone --branch v0.8.3 --single-branch https://github.com/bartongroup/slivka-bio.git
  3. Install with conda: slivka-bio and all its dependencies can be installed automatically by conda package manager
    conda install -c slivka -c bartongroup -c bioconda -c conda-forge slivka-bio=0.8.3

This will download the v0.8.3 version of slivka-bio into a directory named slivka-bio or into your conda environment files. The project, however, requires slivka and the bioinformatic tools to work. If you choose to download the project sources, the dependencies need to be installed manually. The conda installation installs all tools and dependencies automatically.

Conda Installation

The recommended way to manage slivka installation and dependencies is through conda package manager. If you don't have conda or mamba installed, follow the miniconda installation instructions from conda user guide.

Once the conda installation completes, create a new conda environment that will contain slivka and most of the bioinformatic tools used by slivka-bio.

Installing from environment file

The slivka-bio repository contains an exported conda environment file environment.yaml. Importing it is the easiest way to get started with slivka. It automatically takes care of tool versions and dependencies. You can create a new environment named slivka-bio and install the packages from the environment file using the following command. You can choose a different name for the environment if you prefer.

conda env create --name slivka-bio --file environment.yaml

If you chose this installation option, you can skip the installing slivka section and installation of bioinformatic tools with conda.

Installing slivka

If you prefer not to import the existing environment and manage software versions yourself, you can install slivka from our conda channel or from sources. Our conda channel slivka contains more stable versions of slivka. At the time of writing, the current version is 0.8.3. It is recommended to use python version 3.10 which should be compatible with all bioinformatic tools used by slivka.

conda install python=3.10 slivka::slivka=0.8.3

Alternatively, if you like living on the edge, you can install the version directly from sources from github. You can choose the branch you wish to fetch the sources from by specifying the --branch option.

git clone --branch master --single-branch https://github.com/bartongroup/slivka.git
(cd slivka; python setup.py install)

Keep in mind that slivka-bio does not include binaries for bioinformatic tools except for JRONN and AACon. You need to install them manually from conda or from sources.

Installing bioinformatic tools

If you chose to install slivka-bio dependencies using the environment file then all of the tools will be installed in your environment.

However, if you decided to install or update them manually, they are available as packages from bioconda or bartongroup channels

bioconda channel

The following tools are available from the bioconda:

You can install them with the command below (remember to activate the conda environment first). However, I advise against installing t-coffee from bioconda as their package locks version of other bioinformatic tools causing version conflicts. An alternative, dependency-free version is provided by bartongroup channel. If you want to install t-coffee from bioconda anyway, add t-coffee=13.46 to the command.

conda install -c bioconda -c conda-forge \
    aacon=1.1 \
    clustalo=1.2.4 \
    clustalw=2.1 \
    jronn=7.1 \
    mafft=7.458 \
    msaprobs=0.9.7 \
    muscle=5.1 \
    probcons=1.12 \
    viennarna=2.6.4

bartongroup channel

For the long line tools DisEMBL and GlobPlot depended on a closed-source Tisean package and could not be added to bioconda. However, after the license changes, they are now openly available from our bartongroup channel.

conda install -c bartongroup -c conda-forge disembl=1.4 globplot=2.3

If you haven't installed t-coffee from bioconda, you can do it now with the following command:

conda install -c bartongroup -c conda-forge t_coffee=13.46

Building from sources

It is highly recommended to install the bioinformatic tools using the package managers. However, if you prefer building them from the sources and have a full control over it you are free to do so. After the compilation, make sure that the binary location is included in the PATH variable or set the absolute path to the binary in the service configuration file.

WSGI server

Web Service Gateway Interface is a convention for web servers to forward HTTP requests to python application. Recommended middleware supported by Slivka include Gunicorn and uWSGI. You need to install one of those (both available as conda packages) to use slivka server. If you want to use other software the wsgi application is located in the wsgi.py module file and is named application.

Configuration

Slivka-bio configuration is organised into multiple files. Basic configuration is located in the settings.yaml file in the repository root directory The configurations for each service is located in its respective file in the services folder. If slivka-bio was installed with conda package manager, the configuration files are located at $CONDA_PREFIX/var/slivka-bio

For in depth service configuration instructions refer to the slivka documentation.

Launching

MongoDB

Slivka depends on MongoDB for exchanging and storing data. Ask your system administrator for installation and access to the mongo database on your system or, if you need user installation only, mongodb is available through conda in anaconda channel. Once installed, MongoDB process can be started using mongod command. More information on available command line parameters and configuration can be found in the mongod documentation.

Slivka

Once you finished the configuration step, you can deploy your own slivka server. First, navigate to the slivka configuration directory (the one having settings.yaml in it). Alternatively, you can set SLIVKA_HOME environment variable pointing to that directory. For slivka to operate properly, you need to start its three processes: http server which manager incoming connections, scheduler which collects and dispatches incoming job requests, local-queue which stacks and runs incoming jobs on the local machine.

The three processes are launched using slivka command created during slivka installation. Alternatively, you can use manage.py script located in the project directory which automatically sets SLIVKA_HOME variable when started. If you have slivka-bio installed as a conda package, use slivka-bio command instead. All other command line parameters remain the same.

Server

slivka start server [-t TYPE] [-d -p PID_FILE] [-w WORKERS]

Starts the HTTP server using WSGI application specified by TYPE. Allowed values are devel, uwsgi or gunicorn. The specified application must be installed and available in the PATH. The development server is always available, but it can't serve more than one client at the time therefore it's not recommended for production.

If you want to make your server publicly accessible, we recommend running it behind a reverse proxy server. Refer to your wsgi application documentation for more details.

Providing -d flag along with -p PID_FILE starts the process in background as a daemon and writes its pid to the file.

You can also specify the number of worker processes explicitly. Defaults to twice the cpu-count.

Scheduler

slivka start scheduler [-d -p PID_FILE]

Starts the scheduler that collects and dispatches new jobs and monitors their states. Providing -d flag starts the scheduler as a daemon and -p PID_FILE specifies the pid file location.

Local queue

slivka start local-queue [-d -p PID_FILE] [-w WORKERS]

This is a default job runner which spawns new jobs as subprocesses on the local machine. If you specify -d and -p PID_FILE the process will run as a daemon and write its pid to the specified file.

Additionally, you may specify the number of workers i.e. the number of jobs which can be run simultaneously. Defaults to 2.