Source repository for docker image to run JBrowse at dictyBase.
This repository contains Dockerfile
for various containers needed to
orchestrate and run JBrowse application.
Features | Installation | Quickstart guide | Detail guide |
---|
git clone https://github.com/dictybase-docker/wheel-jbrowse.git
mkdir -p $PWD/data && curl -L -o data/jbrowse-data.tar.gz https://northwestern.box.com/shared/static/xw7sfh72mnreb4nwwkjbj8lvzo3il9dq.gz
docker-compose -f jbrowse_full.yml up -d
In order to understand the containerization strategy for JBrowse, understanding the structure and concept of JBrowse application is important.
The gory details of most of the concepts are described in the JBrowse guide.
Data container concepts from radial topology is borrowed to managed various parts of JBrowse application. Here are the list of data container volumes and their application
/log
: Static web server log from from frontend container./config
: Contain the JBrowse configuration files in /config/jbrowse
folder. It will be mapped the config
subfolder of this repository in the
host./data
: Contains the JBrowse JSON formatted data for the flat file backend./ngs
: Contains the data files(bam,bigwig etc) from NGS experiments. It
will map to /mnt/ngs
folder of the host.This backend of JBrowse needs bunch of perl scripts to prepare JBrowse compatible JSON files from various biological data sources(GFF3, Fasta). The backend container handles this transformation. This transformation needs a database backend which in this case is served by a custom postgresql container.
The application is handled by the
frontend
container. The image contains the application in /usr/src/jbrowse
folder. The data folder /usr/src/jbrowse/data
folder is symlinked to /data/jbrowse
where all JSON formatted files are
kept. The frontend container runs a static file server (port 9595) to run the
JBrowse application.
JBrowse has a local(jbrowse.conf) and genome specific
configuration(tracks.conf) files both of which are kept in the
config
subfolder of this repository. This folder, through docker volume mapping gets
exposed to /config
folder inside the frontend
container.
The config
manager
container copies all files from /config
folder to the jbrowse
application folder. It also runs a file watcher that copies any of the any
updated configuration files from /config
folder. The jbrowse.conf
gets copied to jbrowse source folder /usr/src/jbrowse
. The track config
files gets copied to the data folder( /data/jbrowse
) of the respective
genomes. The location of genome subfolders mapping is kept in the dataset
key of a yml
configuration
file.
The NGS(next generation sequence) data is expected to be present in
/mnt/ngs
folder of host OS. The /mnt/ngs
folder is made available
through /ngs
volume through a docker data container. The NGS data is also
expected to follow a folder structure like this ...
rnaseq
├── PRJNA118577
│ ├── bam
│ └── bw
└── PRJNA143419
├── bam
└── bw
The RNA-Seq data goes inside a top level rnaseq
folder. Each dataset
will be inside a folder named after its study accession number. Bam and
bigwig files will be inside their respective folders. To make it available in
JBrowse data folder, a symlink is created between /data/jbrowse/rnaseq
and
/ngs/rnaseq
subfolders. In effect the NGS data gets available inside JBrowse
data folder.
The JBrowse application could be started using two different flavours, one is end to end and the other is data only. Each of which in turn could be run with or without any NGS data.
In this setup data is generated from GFF3 files through a temporary postgresql database. The data generation is done through a set of perl scripts that shipped with JBrowse.
docker-compose -f jbrowse_full.yml up -d
Here the data generation process is skipped and instead a copy of the generated data is used directly for running jbrowse.
docker-compose -f jbrowse_data_only.yml up -d