Umi-pipeline-nf

Umi-pipeline-nf creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline can be run for the whole fastq_pass folder of your nanopore run and, per default, outputs the aligned consensus sequences of each UMI cluster in bam file. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences. Umi-pipeline-nf orignates from a snakemake-based analysis pipeline (pipeline-umi-amplicon; originally developed by Karst et al, Nat Methods 18:165–169, 2021). We migrated the pipeline to Nextflow and included several optimizations and additional functionalities.

Workflow

Input Fastq-files are merged and filtered.
Reads are aligned against a reference genome and filtered to keep only full-length on-target reads.
The flanking UMI sequences of all reads are extracted.
The extracted UMIs are used to cluster the reads.
Per cluster, highly accurate consensus sequences are created.
The consensus sequences are aligned against the reference sequenced.
An optional variant calling step can be performed.
UMI-extraction, clustering, consensus sequence creation, and mapping are repeated.
An optional variant calling step can be performed.

See the output documentation for a detailed overview of the pipeline and its output files.

Main Adaptations

It comes with a docker/singularity container making installation simple, easy to use on clusters and results highly reproducible.
The pipeline is optimized for parallelization.
Additional UMI cluster splitting step to remove admixed UMI clusters.
Read filtering strategy per UMI cluster was adapted to preserve the highest quality reads.
Three commonly used variant callers (freebayes, lofreq or mutserve) are supported by the pipeline.
The raw reads can be optionally subsampled.
The raw reads can be filtered by read length and quality.

See the usage documentation for all of the available parameters of the pipeline.

Quick Start

Install nextflow.
Download the pipeline and test it on a minimal dataset with a single command.

nextflow run genepi/umi-pipeline-nf -r v0.2.1 -profile test,docker

Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).

nextflow run genepi/umi-pipeline-nf -r v0.2.1 -c <custom.config> -profile custom,<docker,singularity>

Citation

If you use the pipeline please cite our Paper:

Amstler, S., Streiter, G., Pfurtscheller, C. et al. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 16, 117 (2024). https://doi.org/10.1186/s13073-024-01391-8

Credits

The pipeline was written by (@StephanAmstler).
Nextflow template pipeline: EcSeq.
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: nanoporetech/pipeline-umi-amplicon.
UMI-corrected nanopore sequencing analysis first shown by: SorenKarst/longread_umi.