Regression proving with Coq: past, present, and future

In ongoing work, we are investigating regression proving techniques for Coq. Regression proving is concerned with re-checking affected proofs after a change has been made to a verification project.

Our main goal is to increase the productivity of proof engineers in large scale projects, where a short "edit-check cycle" is critical. Inspiration comes primarily from regression testing research in software engineering: we consider Coq proofs analogous to program tests. In particular, we assume engineers make generous use of CI and other automation as in software projects.

Here is an attempt at a summary of the current state of our work and ideas for future improvements. Many of these improvements require changes in Coq itself.

Background

Our starting point is the document-oriented model and underlying proof processing architecture introduced in Coq 8.5. Specifically, the .vio file format introduced in that version supported two key features for the batch processing toolchain:

asynchronous checking of a single opaque proof in a file (e.g., coqc -check-vio-tasks 1 A)
fine-grained proof checking parallelism (e.g., coqc -schedule-vio-checking 3 A.vio B.vio)

These two features mirror the possibilities in software batch test execution, where individual tests can be run in isolation, and where test execution can be parallelized both at the test method and test class level.

iCoq: sequential regression proof selection

main paper in ASE '17 [slides]
tool paper in ICSE-Demo '18 [slides]

Regression test selection techniques for Java-like languages track dependencies among classes and methods and use this information to perform change impact analysis when a project is modified. Regression test selection tools are typically run in CI and persist dependency metadata between runs.

iCoq is our analogue of Java regression test selection tools for Coq and builds on asynchronous checking of opaque proofs. It analyzes dependencies between definitions and lemmas, and finds and checks (only) impacted proofs. It can be run either locally or in CI.

iCoq implementation highlights:

plugin for dependency extraction from .vio files using the approach from dpdgraph
plugin for smart checksumming of elaborated Coq terms ("AST fingerprinting")
modification of asynchronous proof checking to print proof term dependencies (like dpdgraph)

See an example of how iCoq works.

Highlights from empirical study on version histories of large projects:

up to 10x speedup in CI environment compared to checking from scratch
up to 3x speedup on local machines compared to coq_makefile

Main limitations of iCoq:

only supports Coq 8.5 (slow proof checking)
no analysis of custom tactics or tracking of tactic dependencies in proofs
no support for parameterized modules
brittle parsing of .v files to (1) extract proof tasks and (2) compute checksums of proof scripts
dependency graph analysis and command generation implemented using Java rather than OCaml
no checking of universe consistency

Possible improvements:

analysis of tactic dependencies
tracking all tactics used by proofs
hook into asynchronous proof checking without modifying Coq itself for dependency extraction
interface directly with .vio files to extract proof tasks and their kernel names, and proof script line numbers for checksumming
compile unimpacted, unchanged files to .vos instead of .vio (since .vio overhead not needed there)
implement smart checksumming using Template-Coq
no need to dump dependencies for proofs produced from unchanged proof scripts with no tactic behavior changes

piCoq: parallel regression proving

paper in ISSTA '18 [slides]

iCoq performed only sequential checking. piCoq extends iCoq to parallel checking and includes classical modes for incremental, file-level parallel checking (similar to coq_makefile). We included more histories of large projects for comparison.

Implementation highlights:

implemented essentially all combinations of parallelism and selection, according to schema below
introduced coqc option for parallel checking of proofs spread out over many files (generalization of -schedule-vio-checking)

	No selection	File selection	Proof selection
File parallelism	- [x] `.vo`	- [x] `.vo`	- [ ] `.vo`
Proof parallelism	- [x] `.vio`	- [x] `.vio`	- [x] `.vio`

Highlights of empirical study on version histories of large projects (including Verdi, Flocq, and Coquelicot):

several projects had very restrictive file dependency graphs which made file parallelization slow (manual reorganization could probably fix this, though)
proof selection combined with proof parallelization was always faster than other modes, for all projects
- up to 28x speedup compared to sequential checking from scratch
- up to 3x speedup compared to sequential proof selection (iCoq)

Possible improvements:

smarter checksumming of .v files, ignoring comments and noise (via SerAPI?)
use .vos and .vok whenever checking all proofs in a file
compile unimpacted, unchanged files to .vos instead of .vio (since .vio overhead not needed there)

Suggestions towards future regression proving support

efficient regression proving depends crucially on both 1 and 2 above; the new vos/vok toolchain leaves out the former and implements a restricted form of the latter, so we hope it can be extended to support the missing functionality (#8783, #8642, #8767)
documentation of -check-vio-tasks option for coqc in manual
external tools should be able to control fine-grained proof scheduling, or at least provide their own data to improve scheduling (instead of only relying on .aux files)
a dependency extraction plugin should ideally be bundled with Coq
asynchronous proving should be able to use elaborated kernel names (e.g., Lib.Mod.my_proof) instead of task numbers
need a simple API to extract metadata from .vos/.vio files
possibility to "fill in" proofs in .vos/.vio files (tasks are "holes")
possibility for more sophisticated debugging output when checking proofs asynchronously on demand, e.g., coqc -verbose -check-vio-tasks 1 A shows name of lemma and statistics

coq / coq