AFLplusplus / qemuafl

This fork of QEMU enables fuzzing userspace ELF binaries under AFL++.
https://aflplus.plus
Other
79 stars 43 forks source link

Add ability to generate a drcov trace #56

Closed JRomainG closed 10 months ago

JRomainG commented 10 months ago

This PR aims at addressing issue #7 using a QEMU TCG plugin for user mode.

It is a backport of the drcov TCG plugin from the latest QEMU version with the additional export of module table information. This information is retrieved from the running program's /proc/self/maps file, as inspired from the Frida code in AFL++. It is particularly useful when fuzzing a shared object, as without module information the coverage cannot be extracted by Lighthouse (or other similar tools).

The plugin can be compiled alongside QEMU by passing --enable-plugins when running ./configure, and then enabled and configured at runtime by setting the QEMU_PLUGIN environment variable (e.g. QEMU_PLUGIN="build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace").

vanhauser-thc commented 10 months ago

Sure, this is useful. This needs documentation though. Can you please add that? The —enable-plugins option, does this impact the execution speed? (Meaning if not this could be enabled by default)

JRomainG commented 10 months ago

Thanks for the quick feedback. I was thinking of adding documentation directly to the AFL++-QEMU documentation rather than in this repository, but would you rather have some specific information directly here somewhere?

I did also performed some basic performance tests which I'll include in a following comment

JRomainG commented 10 months ago

QEMU performance test

Summary

Here is an overview of the results:

\ Plugins disabled Plugins enabled drcov plugin
sysbench cpu (1 thread) 246.91/s 241.90/s 107.34/s
sysbench cpu (10 threads) 325.39/s 325.81/s 25.87/s
sysbench memory 491528.13/s 468790.49/s 289437.42/s
sysbench fileio r=2096.09/s,w=1397.39/s,s=4481.93/s r=2062.39/s,w=1374.86/s,s=4406.57/s r=1974.62/s,w=1316.41/s,s=4219.21/s
7ip (1 thread) 1397 1438 382
7ip (10 threads) 7655 7912 n/a

Enabling plugins at compile-time does not have a clear impact on performance at runtime. However, using the drcov plugin itself does have a big performance impact, especially for multithreaded targets.

A quick look at the code suggests that, outside of the plugins themselves if they are loaded at runtime, there aren't too many new code paths executed when --enable-plugins is used (which is the default in recent releases of QEMU), but these codes paths are related to TCG which is called many times during emulation.

My initial approach was to simply add a PLUGINS option in AFL++'s build_qemu_support.sh script. Though it requires recompiling QEMU to enable plugins, it could be a good option as well.

I don't have any particular opinion regarding this, and was planning to open a PR on the AFL++ repository if this one is merged, so let me know which option you prefer.

Test methodology

Plugins disabled

# Build qemuafl
$ cd AFLplusplus/qemu_mode
$ make -C qemuafl clean
$ NO_CHECKOUT=1 ./build_qemu_support.sh
# Verify plugins are actually disabled
$ ../afl-qemu-trace -h | grep plugin
# Run performance tests
$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=1
$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=10
$ ../afl-qemu-trace /usr/bin/sysbench memory run --threads=1
$ ../afl-qemu-trace /usr/bin/sysbench fileio prepare
$ ../afl-qemu-trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt1
$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt10

Plugins enabled

# Build qemuafl
$ make -C qemuafl clean
$ NO_CHECKOUT=1 PLUGINS=1 ./build_qemu_support.sh
# Verify plugins are enabled
$ ../afl-qemu-trace -h | grep plugin
-plugin              QEMU_PLUGIN       [file=]<file>[,arg=<string>]
# Run performance tests
$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=1
$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=10
$ ../afl-qemu-trace /usr/bin/sysbench memory run --threads=1
$ ../afl-qemu-trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt1
$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt10

drcov plugin

# Compile the plugins themselves
$ make -C qemuafl plugins
# Run performance tests
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench cpu run --threads=1
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench cpu run --threads=10
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench memory run --threads=1
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/lib/p7zip/7z b -mmt1
$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/lib/p7zip/7z b -mmt10

Raw results

Plugins disabled

sysbench cpu (1 thread)

$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   246.91

General statistics:
    total time:                          10.0045s
    total number of events:              2471

Latency (ms):
         min:                                    3.80
         avg:                                    4.04
         max:                                    7.21
         95th percentile:                        4.74
         sum:                                 9986.48

Threads fairness:
    events (avg/stddev):           2471.0000/0.00
    execution time (avg/stddev):   9.9865/0.00

sysbench cpu (10 threads)

$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=10
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 10
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   325.39

General statistics:
    total time:                          10.0188s
    total number of events:              3261

Latency (ms):
         min:                                   19.47
         avg:                                   30.68
         max:                                   88.95
         95th percentile:                       33.72
         sum:                               100058.58

Threads fairness:
    events (avg/stddev):           326.1000/9.19
    execution time (avg/stddev):   10.0059/0.01

sysbench memory

$ ../afl-qemu-trace /usr/bin/sysbench memory run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 4917203 (491528.13 per second)

4801.96 MiB transferred (480.01 MiB/sec)

General statistics:
    total time:                          10.0009s
    total number of events:              4917203

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.25
         95th percentile:                        0.00
         sum:                                 3576.19

Threads fairness:
    events (avg/stddev):           4917203.0000/0.00
    execution time (avg/stddev):   3.5762/0.00

sysbench fileio

$ ../afl-qemu-trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

File operations:
    reads/s:                      2096.09
    writes/s:                     1397.39
    fsyncs/s:                     4481.93

Throughput:
    read, MiB/s:                  32.75
    written, MiB/s:               21.83

General statistics:
    total time:                          10.0153s
    total number of events:              79775

Latency (ms):
         min:                                    0.00
         avg:                                    0.12
         max:                                    5.87
         95th percentile:                        0.34
         sum:                                 9740.97

Threads fairness:
    events (avg/stddev):           79775.0000/0.00
    execution time (avg/stddev):   9.7410/0.00

7ip (1 thread)

$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt1
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs QEMU Virtual CPU version 2.5+ (663),ASM)

QEMU Virtual CPU version 2.5+ (663)
CPU Freq: 2206896 64000000 32000000 21333333 128000000 85333333 128000000 341333333 292571428

RAM size:   31722 MB,  # CPU hardware threads:  12
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1308   100   1273   1273  |      17083   100   1459   1459
23:       1222   100   1246   1246  |      17611   100   1525   1524
24:       1212   100   1304   1304  |      16760   100   1471   1471
25:       1268   100   1449   1448  |      16283   100   1450   1449
----------------------------------  | ------------------------------
Avr:             100   1318   1318  |              100   1476   1476
Tot:             100   1397   1397

7ip (10 threads)

$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt10
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs QEMU Virtual CPU version 2.5+ (663),ASM)

QEMU Virtual CPU version 2.5+ (663)
CPU Freq: 2206896 32000000 32000000 32000000 128000000 128000000 256000000 512000000 682666666

RAM size:   31722 MB,  # CPU hardware threads:  12
RAM usage:   2206 MB,  # Benchmark threads:     10

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       7055   731    939   6864  |     100527   947    905   8574
23:       6733   697    984   6861  |      91829   908    875   7946
24:       6272   669   1008   6745  |      93721   956    861   8225
25:       7032   746   1077   8030  |      89795   955    837   7992
----------------------------------  | ------------------------------
Avr:             711   1002   7125  |              942    869   8184
Tot:             826    936   7655

Plugins enabled

sysbench cpu (1 thread)

$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   241.90

General statistics:
    total time:                          10.0048s
    total number of events:              2421

Latency (ms):
         min:                                    3.93
         avg:                                    4.13
         max:                                    7.46
         95th percentile:                        4.57
         sum:                                 9988.46

Threads fairness:
    events (avg/stddev):           2421.0000/0.00
    execution time (avg/stddev):   9.9885/0.00

sysbench cpu (10 threads)

$ ../afl-qemu-trace /usr/bin/sysbench cpu run --threads=10
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 10
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   325.81

General statistics:
    total time:                          10.0147s
    total number of events:              3264

Latency (ms):
         min:                                   18.38
         avg:                                   30.65
         max:                                   80.45
         95th percentile:                       34.95
         sum:                               100041.13

Threads fairness:
    events (avg/stddev):           326.4000/7.67
    execution time (avg/stddev):   10.0041/0.00

sysbench memory

$ ../afl-qemu-trace /usr/bin/sysbench memory run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 4689820 (468790.49 per second)

4579.90 MiB transferred (457.80 MiB/sec)

General statistics:
    total time:                          10.0009s
    total number of events:              4689820

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.26
         95th percentile:                        0.00
         sum:                                 3579.28

Threads fairness:
    events (avg/stddev):           4689820.0000/0.00
    execution time (avg/stddev):   3.5793/0.00

sysbench fileio

$ ../afl-qemu-trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

File operations:
    reads/s:                      2062.39
    writes/s:                     1374.86
    fsyncs/s:                     4406.57

Throughput:
    read, MiB/s:                  32.22
    written, MiB/s:               21.48

General statistics:
    total time:                          10.0180s
    total number of events:              78478

Latency (ms):
         min:                                    0.00
         avg:                                    0.12
         max:                                    9.05
         95th percentile:                        0.34
         sum:                                 9749.91

Threads fairness:
    events (avg/stddev):           78478.0000/0.00
    execution time (avg/stddev):   9.7499/0.00

7ip (1 thread)

$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt1
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs QEMU Virtual CPU version 2.5+ (663),ASM)

QEMU Virtual CPU version 2.5+ (663)
CPU Freq: 2133333 32000000 32000000 32000000 64000000 64000000 256000000 341333333 341333333

RAM size:   31722 MB,  # CPU hardware threads:  12
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1367   100   1330   1330  |      17402   100   1486   1486
23:       1367   100   1393   1393  |      18012   100   1560   1559
24:       1278   100   1375   1375  |      16211   100   1423   1423
25:       1290   100   1473   1473  |      16481   100   1467   1467
----------------------------------  | ------------------------------
Avr:             100   1393   1393  |              100   1484   1484
Tot:             100   1438   1438

7ip (10 threads)

$ ../afl-qemu-trace /usr/lib/p7zip/7z b -mmt10
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs QEMU Virtual CPU version 2.5+ (663),ASM)

QEMU Virtual CPU version 2.5+ (663)
CPU Freq: 2206896 21333333 16000000 32000000 128000000 64000000 170666666 204800000 409600000

RAM size:   31722 MB,  # CPU hardware threads:  12
RAM usage:   2206 MB,  # Benchmark threads:     10

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       6197   649    929   6029  |     102430   955    915   8737
23:       7183   704   1040   7320  |      97895   971    872   8471
24:       7315   748   1052   7865  |      95794   963    873   8407
25:       7242   757   1093   8269  |      92135   958    856   8200
----------------------------------  | ------------------------------
Avr:             714   1028   7371  |              962    879   8454
Tot:             838    954   7912

drcov plugin

sysbench cpu (1 thread)

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench cpu run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   107.34

General statistics:
    total time:                          10.0105s
    total number of events:              1075

Latency (ms):
         min:                                    8.55
         avg:                                    9.29
         max:                                   13.87
         95th percentile:                       10.84
         sum:                                 9991.16

Threads fairness:
    events (avg/stddev):           1075.0000/0.00
    execution time (avg/stddev):   9.9912/0.00

sysbench cpu (10 threads)

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench cpu run --threads=10
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 10
Initializing random number generator from current time

Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:    25.87

General statistics:
    total time:                          10.0815s
    total number of events:              261

Latency (ms):
         min:                                  118.82
         avg:                                  384.91
         max:                                  449.59
         95th percentile:                      411.96
         sum:                               100460.59

Threads fairness:
    events (avg/stddev):           26.1000/0.30
    execution time (avg/stddev):   10.0461/0.02

sysbench memory

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench memory run --threads=1
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Running memory speed test with the following options:
  block size: 1KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 2896084 (289437.42 per second)

2828.21 MiB transferred (282.65 MiB/sec)

General statistics:
    total time:                          10.0014s
    total number of events:              2896084

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.42
         95th percentile:                        0.00
         sum:                                 3740.00

Threads fairness:
    events (avg/stddev):           2896084.0000/0.00
    execution time (avg/stddev):   3.7400/0.00

sysbench fileio

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/bin/sysbench fileio run --threads=1 --file-test-mode=rndrw
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 16MiB each
2GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

File operations:
    reads/s:                      1974.62
    writes/s:                     1316.41
    fsyncs/s:                     4219.21

Throughput:
    read, MiB/s:                  30.85
    written, MiB/s:               20.57

General statistics:
    total time:                          10.0206s
    total number of events:              75179

Latency (ms):
         min:                                    0.00
         avg:                                    0.13
         max:                                   14.22
         95th percentile:                        0.34
         sum:                                 9609.92

Threads fairness:
    events (avg/stddev):           75179.0000/0.00
    execution time (avg/stddev):   9.6099/0.00

7ip (1 thread)

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/lib/p7zip/7z b -mmt1
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs QEMU Virtual CPU version 2.5+ (663),ASM)

QEMU Virtual CPU version 2.5+ (663)
CPU Freq: 1523809 21333333 32000000 21333333 64000000 64000000 128000000 93090909 120470588

RAM size:   31722 MB,  # CPU hardware threads:  12
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:        351   100    342    342  |       4503   100    385    384
23:        367   100    374    374  |       4570   100    396    396
24:        351   100    378    378  |       4390   100    385    385
25:        345   100    395    394  |       4476   100    398    398
----------------------------------  | ------------------------------
Avr:             100    372    372  |              100    391    391
Tot:             100    382    382

7ip (10 threads)

$ ../afl-qemu-trace -plugin qemuafl/build/contrib/plugins/libdrcov.so,arg=filename=/tmp/coverage.drcov.trace /usr/lib/p7zip/7z b -mmt10
Canceled (too long)
vanhauser-thc commented 10 months ago

thank you for the analysis. Then I would say lets enable plugins by default. about documentation - it depends where the README.md will be. if it is in qemuafl/plugins then it has to be here, if in AFL++/qemu_mode then over there :)

JRomainG commented 10 months ago

I added documentation in AFL++/qemu_mode as it felt better suited (see this PR)