This package contains the IntroClass benchmark.
The full IntroClass benchmark is hosted at http://repairbenchmarks.cs.umass.edu/IntroClass, and is a part of the Autorepair Benchmark Suite, a joint project between Carnegie Mellon University and the University of Massachusetts, Amherst. Researchers from the University of Lille (France) and INRIA have contributed to the IntroClass benchmark. The homepage for the Autorepair Benchmark Suite is http://repairbenchmarks.cs.umass.edu.
introclass/
|-Makefile
|-bin/
|-checksum/
|--Makefile
|--tests/
|----blackbox/
|--------1.in
|--------1.out
|----whitebox/
|--f4a823174201234546789abcdeffff<repository ID hex string>.../
|----Makefile
|----001/
|--------ae-001.log
|--------blackbox\_test.sh
|--------checksum.c
|--------gp-001.log
|--------Makefile
|--------metadata.json
|--------whitebox\_test.sh
|----002/
|--------<same as above>
|--09F911029D74E35BD84156C5635688C0<next repository ID hex string>.../
|-digits/
|-...
Every directory's Makefile will compile the C programs in that directory or in all subdirectories.
The IntroClass benchmark consists of solutions to six programming assignments:
Every subdirectory below each of these top-level directories represents a student's submitted attempts at solving the homework problems. These attempts are numbered starting from the first revision that contained a solution. Each student could submit each assignment as many times as they wished, there are different numbers of subdirectories between students.
Sometimes, the students submitted identical code and so some directories may have identical defects. We did not remove these duplicates because the IntroClass benchmark is representative of both the type and the frequency of bugs made by novice developers. However, we have identified these duplicates. In each programming assignment's directory, the program-metadata.json lists the unique defects.
This table summarizes the number of defects and the number of unique defects for each programming assignment.
Project | # Repo | # defects | # unique defects |
---|---|---|---|
checksum | 21 | 69 | 47 |
digits | 55 | 236 | 144 |
grade | 50 | 268 | 136 |
median | 44 | 232 | 98 |
smallest | 45 | 177 | 84 |
syllables | 44 | 161 | 63 |
----------- | --------- | ----------- | ------------------ |
Total | 259 | 1143 | 572 |
Each submission directory contains the C source code for the student
submission named <benchmark>.c
. You can compile this to a binary with the
Makefile in the directory.
To run the blackbox or whitebox test suites against a compiled submission,
use the blackbox_test.sh
and whitebox_test.sh
scripts in a submission
directory. This is described in detail in the Running section.
A submission directory also contains the Genprog and AE log files produced
during our experiments in the paper "The ManyBugs and IntroClass Benchmarks
for Automated Repair of C Programs"
(http://people.cs.umass.edu/~brun/pubs/pubs/LeGoues15tse.pdf), and a file
medatada.json
that contains the program's output for each of the blackbox
and whitebox tests, the tests the programs pass and fail, whether the program
is nondeterministic (i.e., whether the test results sometimes differ), and
the original git revision of the submission.
The IntroClass benchmark set has no build dependencies outside the C standard
library. Run make
in any directory to build the programs in that directory
and its subdirectories, if any.
To run the tests for a particular benchmark, invoke the script
whitebox_test.sh
or blackbox_test.sh
. For convenience, every submission
directory has a copy of these scripts, but they are identical and
interchangeable. They take three arguments:
genprog_tests.py [submission executable] [test index]
The submission executable is the student submission under test; the test index is a string pX or nX, representing the Xth (starting from the first) positive (passing) or negative (failing) test case.
The test scripts are structured so that they can be passed to Genprog or AE as a fitness evaluation script.
The genprog_tests.py
script is in the bin/ subdirectory of the benchmark
package. It requires Python 3.3+, but nothing beyond that.
Warning: Due to pointer manipulation, some programs use internally, or
output a memory address. This results in nondeterminism in program execution
due to Address Space Layout Randomization (ASLR). To make the programs run
with fully deterministically requires disabling ASLR. In Linux this is done
by:
(root) $ echo 0 > /proc/sys/kernel/randomize_va_space
To add new tests to the benchmark set, put the test input file in the
appropriate tests directory (underneath its relevant benchmark program).
Generate the output for the test using the reference implementation in
bin/<programname>
.
The whitebox_test.sh and blackbox_test.sh scripts will not recognize custom tests. These scripts are generated using the tests known to pass or fail each specific binary. They must be programmatically regenerated.
If you're planning on adding tests to the benchmark suite, it's therefore best not to rely on the test scripts. Have your infrastructure call genprog_tests.py directly, or use the methods available when importing genprog_tests.py as a Python module. genprog_tests.py will use the pre-computed results for benchmark output if possible, but otherwise will run the reference binary given to tell if a test passes or fails.