dictybase-docker / wheel-jbrowse

Source repository for docker image to run JBrowse at dictyBase
http://dictybase.org/tools/jbrowse
0 stars 0 forks source link

JBrowse for dictyBase

Source repository for docker image to run JBrowse at dictyBase.

This repository contains Dockerfile for various containers needed to orchestrate and run JBrowse application.

Features Installation Quickstart guide Detail guide

Features

Installation

Quickstart

git clone https://github.com/dictybase-docker/wheel-jbrowse.git
mkdir -p $PWD/data && curl -L -o data/jbrowse-data.tar.gz https://northwestern.box.com/shared/static/xw7sfh72mnreb4nwwkjbj8lvzo3il9dq.gz
docker-compose -f jbrowse_full.yml up -d

Detail guide

In order to understand the containerization strategy for JBrowse, understanding the structure and concept of JBrowse application is important.

JBrowse concepts

The gory details of most of the concepts are described in the JBrowse guide.

Strategy for containerization

Data containers

Data container concepts from radial topology is borrowed to managed various parts of JBrowse application. Here are the list of data container volumes and their application

Flat file backend

This backend of JBrowse needs bunch of perl scripts to prepare JBrowse compatible JSON files from various biological data sources(GFF3, Fasta). The backend container handles this transformation. This transformation needs a database backend which in this case is served by a custom postgresql container.

JBrowse application

The application is handled by the frontend container. The image contains the application in /usr/src/jbrowse folder. The data folder /usr/src/jbrowse/data folder is symlinked to /data/jbrowse where all JSON formatted files are kept. The frontend container runs a static file server (port 9595) to run the JBrowse application.

Configuration files

JBrowse has a local(jbrowse.conf) and genome specific configuration(tracks.conf) files both of which are kept in the config subfolder of this repository. This folder, through docker volume mapping gets exposed to /config folder inside the frontend container. The config manager container copies all files from /config folder to the jbrowse application folder. It also runs a file watcher that copies any of the any updated configuration files from /config folder. The jbrowse.conf gets copied to jbrowse source folder /usr/src/jbrowse. The track config files gets copied to the data folder( /data/jbrowse) of the respective genomes. The location of genome subfolders mapping is kept in the dataset key of a yml configuration file.

NGS data

The NGS(next generation sequence) data is expected to be present in /mnt/ngs folder of host OS. The /mnt/ngs folder is made available through /ngs volume through a docker data container. The NGS data is also expected to follow a folder structure like this ...

    rnaseq
    ├── PRJNA118577
    │   ├── bam
    │   └── bw
    └── PRJNA143419
        ├── bam
            └── bw

The RNA-Seq data goes inside a top level rnaseq folder. Each dataset will be inside a folder named after its study accession number. Bam and bigwig files will be inside their respective folders. To make it available in JBrowse data folder, a symlink is created between /data/jbrowse/rnaseq and /ngs/rnaseq subfolders. In effect the NGS data gets available inside JBrowse data folder.

Starting containers

The JBrowse application could be started using two different flavours, one is end to end and the other is data only. Each of which in turn could be run with or without any NGS data.

End to End

In this setup data is generated from GFF3 files through a temporary postgresql database. The data generation is done through a set of perl scripts that shipped with JBrowse.

docker-compose -f jbrowse_full.yml up -d

Data only

Here the data generation process is skipped and instead a copy of the generated data is used directly for running jbrowse.

docker-compose -f jbrowse_data_only.yml up -d