anse1 / sqlsmith

A random SQL query generator
GNU General Public License v3.0
724 stars 123 forks source link
fuzz-testing monetdb postgresql sqlite3

[[logo.png]]

: "I love the smell of coredumps in the morning"

** Description SQLsmith is a random SQL query generator. Its paragon is [[https://embed.cs.utah.edu/csmith/][Csmith]], which proved valuable for quality assurance in C compilers.

It currently supports generating queries for PostgreSQL, SQLite 3 and MonetDB. To add support for another RDBMS, you need to implement two classes providing schema information about and connectivity to the device under test.

Besides developers of the RDBMS products, users developing extensions might also be interested in exposing their code to SQLsmith's random workload.

Since 2015, it found 118 bugs in alphas, betas and releases in the aforementioned products, including security vulnerabilities in released versions. Additional bugs were squashed in extensions and libraries such as orafce and glibc.

https://github.com/anse1/sqlsmith/wiki#score-list

** Dependencies

optional:

** Building on Debian

: apt-get install build-essential autoconf autoconf-archive libpqxx-dev libboost-regex-dev libsqlite3-dev : cd sqlsmith : autoreconf -i # Avoid when building from a release tarball : ./configure : make

** Building on OSX

In order to build on Mac OSX, assuming you use Homebrew, run the following

: brew install libpqxx automake libtool autoconf autoconf-archive pkg-config : cd sqlsmith : autoreconf -i # Avoid when building from a release tarball : ./configure : make

** Usage

SQLsmith connects to the target database to retrieve the schema for query generation and to send the generated queries to. Currently, all generated statements are rolled back. Beware that SQLsmith does call functions that could possibly have side-effects (e.g. pg_terminate_backend). Use a suitably underprivileged user for its connection to avoid this.

Example invocations:

: # testing Postgres : sqlsmith --verbose --target="host=/tmp port=65432 dbname=regression" : # testing SQLite : sqlsmith --verbose --sqlite="file:$HOME/.mozilla/firefox/places.sqlite?mode=ro" : # testing MonetDB : sqlsmith --verbose --monetdb="mapi:monetdb://localhost:50000/smith"

The following options are currently supported:

| =--target=connstr= | target postgres database (default: libpq defaults) | | =--sqlite=URI= | target SQLite3 database | | =--monetdb=URI= | target MonetDB database | | =--log-to=connstr= | postgres db for logging errors into (default: don't log) | | =--verbose= | emit progress output | | =--version= | show version information | | =--seed=int= | seed RNG with specified integer instead of PID | | =--dry-run= | print queries instead of executing them | | =--max-queries=long= | terminate after generating this many queries | | =--exclude-catalog= | don't generate queries using catalog relations | | =--dump-all-queries= | dump queries as they are generated | | =--dump-all-graphs= | dump generated ASTs for debugging | | =--rng-state=string= | deserialize dumped rng state |

Sample output:

=--verbose= makes sqlsmith emit some progress indication to stderr. A symbol is output for each query sent to the server. Currently the following ones are generated:

| symbol | meaning | details | |--------+-------------------+-----------------------------------------------| | . | ok | Query generated and executed with ok sqlstate | | S | syntax error | These are bugs in sqlsmith - please report | | t | timeout | SQLsmith sets a statement timeout of 1s | | C | broken connection | These happen when a query crashes the server | | e | other error | |

When you test against a RDBMS that doesn't support some of SQLsmith's grammar, there will be a burst of syntax errors on startup. These should disappear after some time as SQLsmith blacklists productions that consistently lead to errors.

=--verbose= will also periodically emit error reports. In the following example, these are mostly caused by the primitive type system.

: queries: 39000 (202.399 gen/s, 298.942 exec/s) : AST stats (avg): height = 5.599 nodes = 37.8489 : 82 ERROR: invalid regular expression: quantifier operand invalid : 70 ERROR: canceling statement due to statement timeout : 44 ERROR: operator does not exist: point = point : 27 ERROR: operator does not exist: xml = xml : 22 ERROR: cannot compare arrays of different element types : 11 ERROR: could not determine which collation to use for string comparison : 5 ERROR: invalid regular expression: nfa has too many states : 4 ERROR: cache lookup failed for index 2619 : 4 ERROR: invalid regular expression: brackets [] not balanced : 3 ERROR: operator does not exist: polygon = polygon : 2 ERROR: invalid regular expression: parentheses () not balanced : 1 ERROR: invalid regular expression: invalid character range : error rate: 0.00705128

The only one that looks interesting here is the cache lookup one. Taking a closer look at it reveals that it happens when you query a certain catalog view like this:

: self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; : FEHLER: cache lookup failed for index 2619

This is because the planner then puts =pg_get_indexdef(oid)= in a context where it sees non-index-oids, which causes it to croak:

: QUERY PLAN
: ------------------------------------------------------------------------------------ : Hash Join (cost=17.60..30.65 rows=9 width=4) : Hash Cond: (i.oid = x.indexrelid) : -> Seq Scan on pg_class i (cost=0.00..12.52 rows=114 width=8) : Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char")) : -> Hash (cost=17.31..17.31 rows=23 width=4) : -> Hash Join (cost=12.52..17.31 rows=23 width=4) : Hash Cond: (x.indrelid = c.oid) : -> Seq Scan on pg_index x (cost=0.00..4.13 rows=113 width=8) : -> Hash (cost=11.76..11.76 rows=61 width=8) : -> Seq Scan on pg_class c (cost=0.00..11.76 rows=61 width=8) : Filter: (relkind = ANY ('{r,m}'::"char"[]))

Now this is more of a curiosity than a bug, but still illustrating how debugging with the help of SQLsmith might look like.

** Large-scale testing

=--log-to= allows logging of hundreds of sqlsmith instances into a central PostgreSQL database. [[./log.sql]] contains the schema sqlsmith expects and some additional views to generate reports on the logged contents.

It also contains a trigger to filter boring/known errors based on the contents of the tables known and known_re. I periodically COPY my filter tables for testing PostgreSQL into the files [[./known_re.txt]] and [[./known.txt]] to serve as a starting point.

** Resources

** License

SQLsmith is available under GPLv3. Use it at your own risk. It may damage your database (one of the purposes of this tool /is/ to try and break things). See the file [[COPYING]] for details.

** Authors

Andreas Seltenreich seltenreich@gmx.de

Bo Tang tangloner@gmail.com

Sjoerd Mullender sjoerd@acm.org

[[ast.png]]