asadprodhan / GPU-accelerated-guppy-basecalling

GPU-accelerated guppy basecalling and demultiplexing on Linux
16 stars 3 forks source link
cuda gpgpu gpu gpu-computing nvidia

GPU-accelerated guppy basecalling



Oxford Nanopore Sequencing

Introduction

The usage of Graphics Processing Unit (GPU) in high performance computing (HPC) has made a paradigm shift in HPC capabilities. At the heart of this performance is the parallel data processing architecture of the GPUs. In brief, a traditional CPU (Central Processing Unit) executes a task sequentially using its 4-8 cores. Unlike CPUs, a GPU breaks a task into smaller components and executes them in a parallel fashion using its thousands of GPU cores. As such, the GPU-powered compute nodes can take on highly data-intensive workloads and accomplish them in an unprecedented speed.

The GPU-accelerated guppy basecalling is such an example of GPU applications in data-intensive computing. The Oxford Nanopore Technologies (ONT) sequencing platforms generate electronic signals as the raw sequencing data. These signals are converted to the actual DNA/RNA sequences through a process called ‘basecalling’. And guppy is a widely used basecalling algorithm for the ONT sequencing data. However, guppy takes days to complete this basecalling process when it runs on CPU-only computers. On the other hand, GPU-accelerated guppy can accomplish the similar task in hours.

Advantages

Requirements

This tutorial has been written for Ubuntu 18.04 Operating System on Linux.

Adding guppy to PATH variable so that it can be run in the terminal without specifying a path

GPU-accelerated guppy basecalling script

To optimise the script for an efficient basecalling, some details are needed. For example:

How many GPUs are there in my Linux computer?

sudo lshw -C display

What is the name of my NVIDIA GPU device?

nvidia-smi --query-gpu=name --format=csv,noheader

What is the RAM size in my Linux computer?

grep MemTotal /proc/meminfo

How many CPUs are there in my Linux computer?

lscpu

The bascalling script

#!/usr/bin/env bash
guppy_basecaller --disable_pings 
  -i /path_to_fast5_directory 
  -s /path_to_ouput_directory 
  -c dna_r9.4.1_450bps_fast.cfg 
  --min_qscore 7 
  --recursive -x 'cuda:0' 
  --num_callers 4 
  --gpu_runners_per_device 8 
  --chunks_per_runner 1024 
  --chunk_size 1000 
  --compress_fastq 
guppy_basecaller --print_workflows | grep SQK-LSK110

You can also assign the basecalling accuracy level- fast, hac (high accuracy) or sup (supper accuracy)- in the model

All other flags can be found by running ‘guppy_basecaller --help’ command

Basecalling progress

Once the basecalling starts, it will display a progress status bar till the completion:

Basecalling status

How to find whether my GPU is running in full capacity?

You can monitor the utilisation efficiency of your GPU device/s as follows:

watch -d -n 0.5 nvidia-smi

more details by running ‘man watch’ command in the terminal

An explanation of the ‘nvidia-smi’ command output can be found here (Kaul, 2022).

--num_callers, --gpu_runners_per_device, --chunks_per_runner and --chunk_size parameters can be optimised based on the GPU device capacity (Benton, 2021)

GPU-accelerated guppy demultiplexing

Demultiplexing using guppy_barcoder

The demultiplexing script

guppy_barcoder --disable_pings 
  -i /path_to_fast5_directory 
  -s /path_to_ouput_directory 
  --barcode_kits EXP-PBC001 
  -x 'cuda:0' --trim_barcodes 
  --trim_adapters 
  --recursive 
  --compress_fastq

All other flags with descriptions can be found by running the ‘guppy_barcoder --help’ command

List supported barcoding kits

guppy_barcoder --print_kits

Potential errors

error while loading shared libraries: libcuda.so.1: cannot open shared object file

See the suggestions below from the Nanopore community:

https://help.nanoporetech.com/en/articles/6627829-what-does-the-error-while-loading-shared-libraries-libcuda-so-1-cannot-open-shared-object-file-message-mean-when-running-the

Nanopore guidelines for installing Guppy:

https://community.nanoporetech.com/docs/prepare/library_prep_protocols/Guppy-protocol/v/gpb_2003_v1_revaw_14dec2018/linux-guppy

References

Benton, M., 2021. Nanopore Guppy GPU basecalling on Windows using WSL2. URL https://hackmd.io/PrSp6UhqS2qxZ_rKOR18-g#Nanopore-Guppy-GPU-basecalling-on-Windows-using-WSL2 (accessed 3.8.22). Kaul, S., 2022. Explained Output of nvidia-smi Utility. URL https://medium.com/analytics-vidhya/explained-output-of-nvidia-smi-utility-fc4fbee3b124 (accessed 3.8.22).