For Citations, please use the following DOI:
See the documentation:
Or view an example notebook:
IMPORTANT: manuscript is in preparation
Jovian is a Public Health toolkit to automatically process raw NGS data from human clinical matrices (faeces, serum, etc.) into clinically relevant information. It has three main components:
Illumina based Metagenomics:
Includes (amongst other features) data quality control, assembly, taxonomic classification, viral typing, and minority variant identification (quasispecies).
:memo: Please refer to the documentation page for the Illumina Metagenomics workflow for more information.
Illumina based Reference-alignment:
Includes (amongst other features) data quality control, alignment, SNP identification, and consensus-sequence generation.
:exclamation: A reference fasta is required.
:memo: Please refer to the documentation page for the Illumina Reference based workflow for more information.
Nanopore based Reference-alignment:
Includes (amongst other features) data quality control, alignment, SNP identification, and consensus-sequence generation.
:exclamation: A reference fasta is required.
:exclamation: A fasta with primer sequences is required.
:memo: Please refer to the documentation page for the Nanopore Reference based workflow for more information.
User-friendliness:
Wetlab personnel can start, configure and interpret results via an interactive web-report. Click here for an example report.
This makes doing Public Health analyses much more accessible and user-friendly since minimal command-line skills are required.
Audit trail:
All pipeline parameters, software versions, database information and runtime statistics are logged. See details below.
Portable:
Jovian is easily installed on off-site computer systems and at back-up sister institutes. Allowing results to be generated even when the internal grid-computer is down (speaking from experience).
:memo: Please see the full Command Line Reference on the documentation site for a more detailed explanation of each command, including example commands for starting an analysis or common usage examples.
Here, we have a short list of commands and use cases that are used very frequently.
Use case 1:
Metagenomic analylsis based on Illumina data:
bash jovian illumina-metagenomics -i <INPUT DIRECTORY>
Use case 2:
Align Illumina data against a user-provided reference to generate a consensus genome:
bash jovian illumina-reference -i <INPUT DIRECTORY> -ref <REFERENCE FASTA>
Use case 3:
Align Nanopore (multiplex) PCR data against a user-provided reference, remove overrepresented primer sequences, and generate a consensus genome:
bash jovian nanopore-reference -i <INPUT DIRECTORY> -ref <REFERENCE FASTA> -pr <PRIMER FASTA>
use bash jovian -h
to see a full list of commands applicable to the Jovian version that you're using.
:memo: Please refer to our documentation for the full list of features
IGVjs
and allow manual assessment and validation of consensus genomes. All data are visualized via an interactive web-report, as shown here, which includes:
MultiQC
. Krona
interactive taxonomic piecharts are generated. IGVjs
) is included, containing:
After a Jovian analysis is finished you can perform virus-typing (i.e. sub-species level taxonomic labelling). These analyses can be started by the command bash jovian -vt [virus keyword]
, where [virus keyword]
can be:
Keyword | Taxon used for scaffold selection | Notable virus species |
---|---|---|
NoV |
Caliciviridae | Norovirus GI and GII, Sapovirus |
EV |
Picornaviridae | Enteroviruses (Coxsackie, Polio, Rhino, etc.), Parecho, Aichi, Hepatitis A |
RVA |
Rotavirus A | Rotavirus A |
HAV |
Hepatovirus A | Hepatitis A |
HEV |
Orthohepevirus A | Hepatitis E |
PV |
Papillomaviridae | Human Papillomavirus |
Flavi |
Flaviviridae | Dengue (work in progress) |
all |
All of the above | All of the above |
An audit trail, used for clinical reproducibility and logging, is generated and contains:
However, it has limitations since several things are out-of-scope for Jovian to control:
virus typing-tools
version
virus typing-tools
is being worked on, however, this is not trivial and will take some time. :memo: Please refer to our documentation for a detailed overview of the Jovian requirements here
:memo: Please refer to our documentation for detailed instructions regarding the installation of Jovian here
General usage instructions vary for each workflow that we support.
Please refer to the link below corresponding to the workflow that you wish to use
Can be found here.
Can be found here.
Name | Publication | Website |
---|---|---|
BBtools | NA | https://jgi.doe.gov/data-and-tools/bbtools/ |
BEDtools | Quinlan, A.R. and I.M.J.B. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. 2010. 26(6): p. 841-842. | https://bedtools.readthedocs.io/en/latest/ |
BLAST | Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. 25(17): p. 3389-3402. | https://www.ncbi.nlm.nih.gov/books/NBK279690/ |
BWA | Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997. | https://github.com/lh3/bwa |
BioConda | Grüning, B., et al., Bioconda: sustainable and comprehensive software distribution for the life sciences. 2018. 15(7): p. 475. | https://bioconda.github.io/ |
Biopython | Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423. | https://biopython.org/ |
Bokeh | Bokeh Development Team (2018). Bokeh: Python library for interactive visualization. | https://bokeh.pydata.org/en/latest/ |
Bowtie2 | Langmead, B. and S.L.J.N.m. Salzberg, Fast gapped-read alignment with Bowtie 2. 2012. 9(4): p. 357. | http://bowtie-bio.sourceforge.net/bowtie2/index.shtml |
Conda | NA | https://conda.io/ |
DRMAA | NA | http://drmaa-python.github.io/ |
FastQC | Andrews, S., FastQC: a quality control tool for high throughput sequence data. 2010. | https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
gawk | NA | https://www.gnu.org/software/gawk/ |
GNU Parallel | O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014. | https://www.gnu.org/software/parallel/ |
Git | NA | https://git-scm.com/ |
igvtools | NA | https://software.broadinstitute.org/software/igv/igvtools |
Jupyter Notebook | Kluyver, Thomas, et al. "Jupyter Notebooks-a publishing format for reproducible computational workflows." ELPUB. 2016. | https://jupyter.org/ |
Jupyter_contrib_nbextension | NA | https://github.com/ipython-contrib/jupyter_contrib_nbextensions |
Jupyterthemes | NA | https://github.com/dunovank/jupyter-themes |
Krona | Ondov, B.D., N.H. Bergman, and A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 2011. 12: p. 385. | https://github.com/marbl/Krona/wiki |
Lofreq | Wilm, A., et al., LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. 2012. 40(22): p. 11189-11201. | http://csb5.github.io/lofreq/ |
MGkit | Rubino, F. and Creevey, C.J. 2014. MGkit: Metagenomic Framework For The Study Of Microbial Communities. . Available at: figshare [doi:10.6084/m9.figshare.1269288]. | https://bitbucket.org/setsuna80/mgkit/src/develop/ |
Minimap2 | Li, H., Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018. | https://github.com/lh3/minimap2 |
MultiQC | Ewels, P., et al., MultiQC: summarize analysis results for multiple tools and samples in a single report. 2016. 32(19): p. 3047-3048. | https://multiqc.info/ |
Nb_conda | NA | https://github.com/Anaconda-Platform/nb_conda |
Nb_conda_kernels | NA | https://github.com/Anaconda-Platform/nb_conda_kernels |
Nginx | NA | https://www.nginx.com/ |
Numpy | Walt, S. V. D., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), 22-30. | http://www.numpy.org/ |
Pandas | McKinney, W. Data structures for statistical computing in python. in Proceedings of the 9th Python in Science Conference. 2010. Austin, TX. | https://pandas.pydata.org/ |
Picard | NA | https://broadinstitute.github.io/picard/ |
Prodigal | Hyatt, D., et al., Prodigal: prokaryotic gene recognition and translation initiation site identification. 2010. 11(1): p. 119. | https://github.com/hyattpd/Prodigal/wiki/Introduction |
Python | G. van Rossum, Python tutorial, Technical Report CS-R9526, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, May 1995. | https://www.python.org/ |
Qgrid | NA | https://github.com/quantopian/qgrid |
SAMtools | Li, H., et al., The sequence alignment/map format and SAMtools. 2009. 25(16): p. 2078-2079. | http://www.htslib.org/ |
SPAdes | Nurk, S., et al., metaSPAdes: a new versatile metagenomic assembler. Genome Res, 2017. 27(5): p. 824-834. | http://cab.spbu.ru/software/spades/ |
seqkit | Shen, Wei, et al. "SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation." PloS one 11.10 (2016). | https://github.com/shenwei356/seqkit |
Seqtk | NA | https://github.com/lh3/seqtk |
Snakemake | Köster, J. and S.J.B. Rahmann, Snakemake—a scalable bioinformatics workflow engine. 2012. 28(19): p. 2520-2522. | https://snakemake.readthedocs.io/en/stable/ |
Tabix | NA | www.htslib.org/doc/tabix.html |
tree | NA | http://mama.indstate.edu/users/ice/tree/ |
Trimmomatic | Bolger, A.M., M. Lohse, and B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014. 30(15): p. 2114-20. | www.usadellab.org/cms/?page=trimmomatic |
Virus-Host Database | Mihara, T., Nishimura, Y., Shimizu, Y., Nishiyama, H., Yoshikawa, G., Uehara, H., ... & Ogata, H. (2016). Linking virus genomes with host taxonomy. Viruses, 8(3), 66. | http://www.genome.jp/virushostdb/note.html |
Virus typing tools | Kroneman, A., Vennema, H., Deforche, K., Avoort, H. V. D., Penaranda, S., Oberste, M. S., ... & Koopmans, M. (2011). An automated genotyping tool for enteroviruses and noroviruses. Journal of Clinical Virology, 51(2), 121-125. | https://www.ncbi.nlm.nih.gov/pubmed/21514213 |
This project/research has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 643476. and the Dutch working group on molecular diagnostics (WMDI).