Psy-Fer / interARTIC

InterARTIC - An interactive local web application for viral whole genome sequencing utilising the artic network pipelines..
https://psy-fer.github.io/interARTIC/
MIT License
29 stars 7 forks source link
artic-bioinformatics-platform covid19 genomics nanopore

InterARTIC

InterARTIC is an interactive web application designed to simplify the use of the ARTIC bioinformatics pipelines for nanopore sequencing analysis of viral genomes. InterARTIC was initally designed and tested for analysis of SARS-CoV-2, but is suitable for analysis of any virus and/or amplicon scheme, including a user's own custom amplicons. InterARTIC supports both the Nanopolish and Medaka pipeline alternatives from ARTIC, with parameter customisation enabled through a simple graphical interface.

GitHub Downloads

Publication: InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses

Pre-print: InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses

Please cite the following when using interARTIC in your publications:

James M Ferguson, Hasindu Gamaarachchi, Thanh Nguyen, Alyne Gollon, Stephanie Tong, Chiara Aquilina-Reid, Rachel Bowen-James, Ira W Deveson, InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses, Bioinformatics, Volume 38, Issue 5, March 2022, Pages 1443–1446, https://doi.org/10.1093/bioinformatics/btab846

@article{ferguson2022interartic,
  title={InterARTIC: an interactive web application for whole-genome nanopore sequencing analysis of SARS-CoV-2 and other viruses},
  author={Ferguson, James M and Gamaarachchi, Hasindu and Nguyen, Thanh and Gollon, Alyne and Tong, Stephanie and Aquilina-Reid, Chiara and Bowen-James, Rachel and Deveson, Ira W},
  journal={Bioinformatics},
  volume={38},
  number={5},
  pages={1443--1446},
  year={2022},
  publisher={Oxford University Press}
}

Quick start

A video tutorial of setting up and running InterARTIC: https://youtu.be/RCArn-xOkHg

Step 1: Streamlined installation of InterARTIC

Pre-compiled binary releases are provided for Linux and MacOS for easy setup. The linux binaries can be run on Windows using Windows Subsystem for Linux (WSL). Download the latest release for your operating system and architecture, extract the tar ball and run the provided run.sh script by following the instructions below.

IMPORTANT: Make sure the interARTIC binaries reside at a location with no white space characters.

The run.sh script has now launched a new interactive interARTIC session. To see your session, visit http://127.0.0.1:5000 on your web browser. Here, you can configure and run your next job using the graphical interface. Make sure you keep the terminal open to keep your interARTIC session running.

Step 2: Downloading test dataset

Open a new terminal to download and extract the example test dataset. The commands below will extract the dataset to /data, assuming /data exists on the computer (sudo mkdir /data, if not) and you have write permission to /data (sudo chmod 777 /data, if not). The /data folder is the default location for sequencing outputs on an ONT GridION or PromethION device, but on your own machine you may use a custom location such as /home/username/data if you wish (hint: you may use the pwd command on your terminal to get the path of your current working directory).

cd /data
wget https://seq.bioinf.science/interartic-corona -O FLFL031920_sample_data.tar.gz
#if you do not have wget: curl -o FLFL031920_sample_data.tar.gz -L https://seq.bioinf.science/interartic-corona
tar xf FLFL031920_sample_data.tar.gz
rm FLFL031920_sample_data.tar.gz

Once extracted, you should see two directories:

  1. FLFL031920 containing data from a nanopore sequencing run of 10 multiplexed SARS-CoV-2 isolates, performed on an ONT GridION. The .fast5 files, .fastq files and the sequencing summary file are among the extracted data. This example dataset follows the same directory structure of a nanopore sequencing run with live base-calling enabled.
  2. sample-barcodes containing a .csv manifest file that matches sample names to sample barcodes.

For detailed information on the input data structure and .csv manifest file, please visit the InterARTIC usage guide here.

IMPORTANT: Make sure the the data directory and file names do not contain white space.

Step 3: Configuring interARTIC

Configuration is only required if you downloaded the dataset to a custom location instead of /data. In your interARTIC web interface, click Set locations of input data. Fill the first two fields (1. location of your input data, and 2. location of your sample-barcode .csv files are located). For example, if you used /home/username/data the fields should be /home/username/data and /home/username/data/sample-barcodes, respectively. Click confirm to save the settings, which will be used for all future runs.

Step 4: Running InterARTIC on the test dataset

Click Add Job on the interARTIC web interface. Then fill the fields as given in the following table. Note that when you click (double click on some browsers) on fields for input data directory and Select a CSV file, a list of files/directories should appear from which you can select.

field value description
Job name test whatever name that you like for the run (only alpha numeric characters and underscore are allowed)
input data directory FLFL031920 this is the directory containing the nanopore sequencing data
This input contains Multiple samples our example test dataset contains 10 multiplexed samples
Select a CSV file FLFL031920-barcodes.csv .csv manifest file that matches sample names to sample barcodes
virus SARS-CoV-2 (nCoV-2019)
Select your primer scheme Eden V1 (2500bp) our example test dataset used Eden V1 primers
library preparation method Ligation library prep (eg SQK-LSK109) our example test dataset used ligation barcodes
Select a pipeline to run Both we will test both medaka and nanopolish pipelines, which will run one after the other

Now click Submit job(s) and you should see the pipeline running :)

parameters page

Another example dataset containing ebola virus samples that you can use to directly test interARTIC can be downloaded from here. The relavent options for this data set are Multiple samples, ebola-barcodes.csv, IturiEBOV, Artic V1 and Ligation library prep (eg SQK-LSK109).

Output data generated from interARTIC (version 0.2-beta) for the above two example data set can be downloaded for your reference from here.

interARTIC usage

Before running interARTIC on your own nanopore samples, please refer to the detailed guide here. If you want to run interARTIC on custom primer schemes or viruses, refer to the instructions here

Troubleshooting

See here for troubleshooting common issues.

Building from source

Building from source is not straightforward, due to the dependency hell of Python versions (circumventing this hell was one of the motivations for developing interARTIC). Step-by-step instructions for building from source are given here. Any one who wants to build a docker image for interARTIC can do so by following instructions here, though it is highly redundant.

Snake Charming

Developers interested in learning how we create portable binary releases, read our packaging steps (aka the art of snake charming) detailed here.

Updating interARTIC

To update interARTIC to the latest version, simply delete the directory containing the old interARTIC binaries and obtain the latest version by following the same steps under setting up above.

Acknowledgement

interARTIC is a layer built on top of the ARTIC pipeline. Binary releases of interARTIC contain:

  1. Python 3.7 binaries (build: cpython-3.7.7-linux64-20200409T0045) and several Python 3.7 modules available through pypi (e.g., celery, redis, flask, redis-server)
  2. ARTIC pipeline binaries available through bioconda that includes many dependencies (e.g., Python 3.6, medaka, nanopolish)