Motivation

Purely mutation-based fuzzer is poor at generating syntax/semantics correct inputs.
Purely generation-based fuzzer is not guided to touch more code

Approach overview

Designed an IR for SQL to generate syntax correct inputs by mutating them.
Try its best to improve semantic correctness(which is NP hard) through query instantiation(what?)

UniverseFly commented 3 years ago

Ensure syntax correctness through IR

Usage:

SQL -> IR -> mutation -> SQL

SSA form:

Mutation(squirrel strips concrete data before mutation, and only mutates the skeleton, the reason for which would be explained later):

Type based insertion/replacement/deletion

UniverseFly commented 3 years ago

Improving semantic correctness

As mentioned in §5, after mutation the IR program is a syntax-correct skeleton with data stripped. Our instantiator first analyzes the dependency be- tween different data, and fill the skeleton with concrete values that satisfy all dependencies. After the instantiation, the query has a high chance to be semantically correct.

By inferring data dependencies [isA/isAnElement] between semantic-binding data from a set of rules following two principles:

Lifetime e.g. variables should be used after definition
Customization: considering semantic data type(be aware of its difference from IR node type used for mutation), scopes and operations.

Data dependency example:

Relation rule: a tuple of (target relation, source relation, relationship, scope)

For example, the relation rule (UseFromTable, UseTableColumn, isAnElement, nearest) means the data of type UseTableColumn is an element of the nearest data within the same statement that has type UseFromTable

With such predefined rules, the data dependency graph can be inferred.

Instantiation: fill those syntax correct generated skeletons and try to get a semantic correct one. (given the constructed dependency graph as input)

The algorithm (Definition means CreateXXX and Use means UseXXX; a node having a parent means it in the graph has a directed link pointing to some node, i.e. dependent on that pointed node):

UniverseFly commented 3 years ago

The above approach (filling IR skeleton) is called IR instantiation

UniverseFly commented 3 years ago

Implementation

Parser
Fuzzer: on top of AFL replacing original mutation components with its structural mutators

Evaluation

Can Sqirrel detect memory errors from real-world production- level DBMSs?

Can Sqirrel outperform state-of-the-art testing tools?

What are the contributions of language correctness and coverage-based feedback in DBMS testing?

Q1

Bugs

Q2

Unique bugs and crashes
New edge coverage
Syntax & semantics validity

Q3

Control variates method
- Squirrel vs. Squirrel[!semantic] vs. Squirrel[!feedback] vs. Squirrel[!(semantic&syntax)]

UniverseFly / Readings

CCS'20 | Squirrel: Testing Database Management Systems with Language Validity and Coverage Feedback #2

Motivation

Approach overview

Ensure syntax correctness through IR

Improving semantic correctness

Implementation

Evaluation