OpenCL Smith-Waterman Algorithm on Altera FPGA for Large Protein Databases
OSWALD is a software to accelerate Smith-Waterman protein database search on heterogeneous architectures based on Altera FPGAs. It exploits OpenMP multithreading and SIMD computing through SSE and AVX2 extensions on the host while it takes advantage of pipeline and vectorial parallelism on the FPGAs.
Altera OpenCL SDK provides portability to FPGA code.
Queries, database, substitution matrix and gap penalty values are configurable.
OSWALD estimates relative compute power among host and devices to reach a well-balanced workload distribution.
In addition, OSWALD offers two execution modes: (1) FPGA(s) and (2) concurrent host and FPGA(s). On a heterogeneous platform based on two Xeon E5-2670 and a single Altera Stratix V GSD5 Half-Length PCIe Boards, OSWALD reaches up to 58 GCUPS on FPGA mode and 179 GCUPS on hybrid mode (host+FPGA), while searching Environmental NR database.
Databases must be preprocessed before searching it.
OSWALD execution
-O <string> 'preprocess' for database preprocessing, 'search' for database search, 'info' for FPGA information. [REQUIRED]
preprocess
-i, --input=<string> Input sequence filename (must be in FASTA format). [REQUIRED]
-o, --output=<string> Output filename. [REQUIRED]
-c, --cpu_threads=<integer> Number of host threads.
search
-q, --query=<string> Input query sequence filename (must be in FASTA format). [REQUIRED]
-d, --db=<string> Preprocessed database output filename. [REQUIRED]
-m, --execution_mode=<integer> Execution mode: 0 for FPGA only, 1 for concurrent host and FPGA (default: 1).
-c, --host_threads=<integer> Number of host threads (default: 4).
-e, --gap_extend=<integer> Gap extend penalty (default: 2).
-f, --num_fpgas=<integer> Number of FPGAs (default: 1).
-g, --gap_open=<integer> Gap open penalty (default: 10).
-b, --cpu_block_width=<integer> Host block width (default: 256).
-k, --max_chunk_size=<integer> Maximum chunk size in bytes (default: 134217728).
-p, --db_percentage=<float> Database percentage used to estimate relative compute power (default: 0.01).
-r, --top=<integer> Number of scores to show (default: 10).
-s, --sm=<string> Substitution matrix. Supported values: blosum45, blosum50, blosum62, blosum80, blosum90, pam30, pam70, pam250 (default: blosum62).
-v, --vector_length=<integer> Vector length: 16 for host with SSE support, 32 for host with AVX2 support (default: 16).
-?, --help Give this help list
--usage Give a short usage message
Database preprocessing
./oswald -O preprocess -i db.fasta -o out
Preprocess db.fasta database using 4 host threads. The preprocessed database name will be out.
./oswald -O preprocess -i db.fasta -o out -c 8
Preprocess db.fasta database using 8 host threads. The preprocessed database name will be out.
Database search
./oswald -O search -q query.fasta -d out -m 0
Search query sequence query.fasta against out preprocessed database in FPGA mode with 1 accelerator and 4 host threads using SSE instruction set.
./oswald -O search -q query.fasta -d out -m 0 -f 2
Search query sequence query.fasta against out preprocessed database in FPGA mode with 2 accelerators and 4 host threads using SSE instruction set.
./oswald -O search -q query.fasta -d out -m 0 -c 16
Search query sequence query.fasta against out preprocessed database in FPGA mode with 1 accelerator and 16 host threads using SSE instruction set.
./oswald -O search -q query.fasta -d out -m 0 -c 16 -v 32
Search query sequence query.fasta against out preprocessed database in FPGA mode with 1 accelerator and 16 host threads using AVX2 instruction set.
./oswald -O search -q query.fasta -d out -m 1
Search query sequence query.fasta against out preprocessed database in concurrent host and FPGA mode with 4 host threads (SSE) and one single accelerator.
./oswald -O search -q query.fasta -d out -m 1 -f 2
Search query sequence query.fasta against out preprocessed database in concurrent host and FPGA mode with 4 host threads (SSE) and two accelerators.
./oswald -O search -q query.fasta -d out -m 1 -v 32
Search query sequence query.fasta against out preprocessed database in concurrent host and FPGA mode with 4 host threads (AVX2) and one single accelerator.
./oswald -O search -q query.fasta -d out -m 1 -v 32 -b 128
Search query sequence query.fasta against out preprocessed database in concurrent host and FPGA mode with 4 host threads (AVX2, block width equal to 128) and one single accelerator.
./oswald -O search -q query.fasta -d out -m 1 -k 67108864
Search query sequence query.fasta against out preprocessed database in concurrent host and FPGA mode with 4 host threads and one single accelerator. Divide FPGA database part in chunks of maximum size 67108864 bytes.
./oswald --help
./oswald -?
Print help list.
OSWALD: OpenCL Smith-Waterman on Altera FPGA for Large Protein Databases. Enzo Rucci, Carlos García, Guillermo Botella, Armando De Giusti, Marcelo Naiouf and Manuel Prieto-Matías. International Journal of High Performance Computing Applications; 2016; DOI:10.1177/1094342016654215
Smith-Waterman Protein Search with OpenCL on an FPGA. Enzo Rucci, Carlos García, Guillermo Botella, Armando De Giusti, Marcelo Naiouf and Manuel Prieto-Matías. 2015 IEEE Trustcom/BigDataSE/ISPA; DOI:10.1109/Trustcom.2015.634
If you have any question or suggestion, please contact Enzo Rucci (erucci [at] lidi.info.unlp.edu.ar)