HorsePower is designed for optimizing database queries with modern hardware. At its core is HorseIR, which is a well-designed array-based intermediate representation (IR) for database queries. Based on HorseIR, sophisticated compiler optimizations can be applied for database operations. Moreover, using array programming offers a promising option for performance speedup with fine-grained parallelism.
Figure 1. The workflow of the HorsePower framework.
In summer 2017, we started this project from scratch. The workflow of the HorsePower framework can be found in Figure 1. A candidate of the source language is our HorseIR language which is an extension of standard SQL. The Horse language is designed for data analytics with extended SQL features. At the current stage, we adopt execution plans from standard database SQL queries and MATLAB code. We provide a front end for parsing and transforming source code to HorseIR. After the optimization phases, multiple back-ends are supported. Static analyses and code optimizations are performed before the target code is generated. On the other hand, we provide an interpreter which allows running programs directly.
In HorsePower, we focus on the following parts.
- Design and implementation of array-based intermediate representation (IR)
- Static analysis for an array-based IR (i.e. HorseIR)
- Query optimizations with compiler optimizations
- Fine-grained primitive functions and highly tuned libraries
Download the repository
git clone git@github.com:Sable/HorsePower.git
Setup environment variables
cd HorsePower && source ./setup_env.sh
Installation with the following command line (About 13 mins)
(cd ${HORSE_LIB_FOLDER} && sh deploy_linux.sh)
After installation, new folders created as follows.
- include
- lib
- pcre2
Note, it is recommended to use gcc 8.1.0 or higher and additional library
uuid-dev
may be required during the installation.
Default data path for TPC-H
${HORSE_BASE}/data/tpch
In order to generate different scale factor datasets, you should run
cd data/tpch
./run.sh deploy ## Read instructions and update Makefile
./run.sh gendb 1 ## Generate database and save to data/tpch/db1
With a specific scale factor, for example, 1, its path is
${HORSE_BASE}/data/tpch/db1
It contains a tbl
file for each table
${HORSE_BASE}/data/tpch/db1/*.tbl
You are recommended to use the latest version as this project is still under active development.
To learn how to run, type
(cd ${HORSE_SRC_CODE} && ./run.sh) # show usage
Name | Notes |
---|---|
Platform | Cross-platform |
Tools | C/C++, Flex & Bison |
Parallelism | OpenMP/Pthread/CUDA/OpenCL |
Conventions | docs/conventions |
IR design
Database TPC-H
Implementation
Copyright © 2017-2020, Hanfeng Chen, Laurie Hendren and McGill University.