dynaroars / dig

DIG is a numerical invariant generation tool. It infers program invariants or properties over (i) program execution traces or (ii) program source code. DIG supports many forms of numerical invariants, including nonlinear equalities, octagonal and interval properties, min/max-plus relations, and congruence relations.
https://github.com/dynaroars/dig
MIT License
39 stars 6 forks source link
dynamic-analysis invariant-generation loop-invariants machine-learning neural-network-verification program-verification specification-mining symbolic-execution

DIG

DIG is an invariant generation tool that discovers program properties at arbitrary program locations (e.g., loop invariants, post conditions). DIG focuses on numerical invariants and currently supports the following numerical relations:

DIG's numerical relations (in particular, nonlinear relations) have been used in many research projects, including

:exclamation: Quick Info

  • :boom: various examples demonstrating the power of DIG.
  • :rocket: A good starting point to understand DIG and its usage at high level is reading our ICSE'22 tool and TSE'21 papers .
  • :question: FAQs might be useful for researches, e.g., what makes DIG different than others?

:hammer: Setting up and Using DIG

Setup using Docker

details ```bash # clone DIG $ git clone --depth 1 https://github.com/dynaroars/dig.git # Then go to DIG's directory $ cd dig # in DIG's directory # build the docker image, will take some time to install and build everything $ docker build . -t='dig' ... ... # then run dig $ docker run -it dig # docker will drop you into a Linux prompt like below $ root@b53e0bd86c11:/dig/src# # now you can run DIG -- the more CPUs/CORES your machine has, the faster DIG will run. # run DIG on a trace file root@931ac8632c7f:/dig/src# time ~/miniconda3/bin/python3 -O dig.py ../examples/traces/cohendiv.csv -log 4 ... ... # or on a C program # FASTER: restrict nonlinear eqts to degree 2 and but don't generate inequalities or minmax invariants and root@931ac8632c7f:/dig/src# time ~/miniconda3/bin/python3 -O dig.py ../benchmark/c/nla/cohendiv.c -maxdeg 2 -noieqs -nominmax -log 4 # If you have some error SARL expected version X but found Y, then edit ~/.sarl and change from X to Y # SLOWER: but gives everything root@931ac8632c7f:/dig/src# time ~/miniconda3/bin/python3 -O dig.py ../benchmark/c/nla/cohendiv.c -log 4 ... # to update DIG to the latest from github, do a git pull in the main DIG directory in the Docker root@931ac8632c7f:/dig/src# git pull ... ... ```

Usage

DIG can generate invariants from a trace file (a plain text semi-colon separated csv file consisting of concrete values of variables) or a program (a C file .c).

Generating Invariants From Traces

details DIG can infer invariants directly from an `csv` file consisting of concreting program execution traces as shown below. ```txt # in DIG's src directory $ less ../test/traces/cohendiv.csv vtrace1; I q; I r; I a; I b; I x; I y vtrace1; 4; 8; 1; 4; 24; 4 vtrace1; 16; 89; 1; 13; 297; 13 vtrace1; 8; 138; 4; 76; 290; 19 vtrace1; 0; 294; 8; 192; 294; 24 vtrace1; 64; 36; 4; 16; 292; 4 ... vtrace2; I x; I y; I q; I r vtrace2; 280; 24; 11; 16 vtrace2; 352; 11; 32; 0 vtrace2; 22; 298; 0; 22 vtrace2; 274; 275; 0; 274 vtrace2; 2; 287; 0; 2 ... ``` ```txt # in DIG's src directory tnguyen@origin ~/d/src (dev)> time ~/miniconda3/bin/python3 -O dig.py ../tests/traces/cohendiv.csv -log 3 (base) settings:INFO:2021-10-29 13:51:40.966898: dig.py ../tests/traces/cohendiv.csv -log 3 alg:INFO:analyzing '../tests/traces/cohendiv.csv' alg:INFO:check 546 invs using 181 traces (0.26s) alg:INFO:simplify 544 invs (2.35s) vtrace1(17 invs): 1. a*y - b == 0 2. q*y + r - x == 0 3. -q <= 0 4. -y <= -1 5. a - b <= 0 6. r - x <= 0 7. b - r <= 0 8. a - x <= -5 9. -b + y <= 0 10. -x + y <= -6 11. -q - r <= -8 12. -r - x <= -16 13. -x - y <= -10 14. a + 2 - max(q, r, y) <= 0 15. y + 2 - max(b, q, r, 0) <= 0 16. -q === 0 (mod 2) 17. -r - x === 0 (mod 2) vtrace2(8 invs): 1. q*y + r - x == 0 2. -q <= 0 3. -r <= 0 4. q - x <= 0 5. r - x <= 0 6. r - y <= -1 7. -q - r <= -1 8. -x - y <= -10 ``` *Note*: if we just run Dig over traces, then we likely can get spurious inequalities, i.e., they are correct with the given traces, but not real invariants. If given the program source code as shown below, DIG can check the source code and remove spurious results.

Generating Invariants From a Program

details Consider the following `cohendiv.c` program ```c // in DIG's src directory // $ less ../test/cohendiv.c #include #include void vassume(int b){} void vtrace1(int q, int r, int a, int b, int x, int y){} void vtrace2(int q, int r, int a, int b, int x, int y){} void vtrace3(int q, int r, int x, int y){} int mainQ(int x, int y){ vassume(x >= 1 && y >= 1); int q=0; int r=x; int a=0; int b=0; while(1) { vtrace1(q, r, a, b, x, y); if(!(r>=y)) break; a=1; b=y; while (1){ vtrace2(q, r, a, b, x, y); if(!(r >= 2*b)) break; a = 2*a; b = 2*b; } r=r-b; q=q+a; } vtrace3(q, r,x, y); return q; } void main(int argc, char **argv){ mainQ(atoi(argv[1]), atoi(argv[2])); } ``` * To find invariants at some abitrary location, we declare a function `vtraceX` where `X` is some distinct number and call that function at that location. * For example, in `cohendiv.c`, we call `vtrace0`, `vtrace1` at the head of the outter and inner while loops find loop invariants and `vtrace2` before the function exit to find post conditions. * `vtraceX` takes a list of arguments that are variables in scope at the desired location. This tells DIG to find invariants over these variables. > Using symbolic states collected from symbolic execution (default option) * We now run DIG on `cohendiv.c` and discover the following invariants at the `vtracesX` locations: ```sh $ time ~/miniconda3/bin/python3 -O dig.py ../tests/cohendiv.c -log 3 settings:INFO:2021-10-29 13:51:11.038391: dig.py ../tests/cohendiv.c -log 3 alg:INFO:analyzing '../tests/cohendiv.c' alg:INFO:got symbolic states at 4 locs in 4.21s alg:INFO:got 69 ieqs in 1.11s alg:INFO:got 377 minmax in 1.69s alg:INFO:got 6 eqts in 5.50s alg:INFO:check 452 invs using 680 traces (0.33s) alg:INFO:simplify 452 invs (1.40s) * prog cohendiv locs 4; invs 29 (Eqt: 5, MMP: 1, Oct: 23) V 6 T 3 D 2; NL 5 (2) ; -> time eqts 5.5s, ieqs 1.1s, minmax 1.7s, simplify 1.8s, symbolic_states 4.2s, total 11.5s rand seed 1635533471.04, test 62 tmpdir: /var/tmp/dig_92233634043151007_2nugp63w vtrace0(2 invs): 1. -y <= -1 2. -x <= -1 vtrace1(12 invs): 1. a*y - b == 0 2. q*y + r - x == 0 3. -r <= 0 4. -a <= 0 5. -y <= -1 6. q - x <= 0 7. a - q <= 0 8. b - x <= 0 9. r - x <= 0 10. a - b <= 0 11. -q - r <= -1 12. min(q, y) - b <= 0 vtrace2(8 invs): 1. a*y - b == 0 2. q*y + r - x == 0 3. -q <= 0 4. -y <= -1 5. r - x <= 0 6. b - r <= 0 7. a - b <= 0 8. -b + y <= 0 vtrace3(7 invs): 1. q*y + r - x == 0 2. -q <= 0 3. -r <= 0 4. q - x <= 0 5. r - x <= 0 6. r - y <= -1 7. -r - x <= -1 ``` > Using Random Inputs The `-noss` option disables symbolic states and thus makes DIG behaves as a pure *dynamic* invariant generation tools. Here, DIG runs the program on random inputs, collects traces, and infers invariants. It does not use symbolic states and thus does not require symbolic execution tools, but it can have spurious results. ```sh $ time ~/miniconda3/bin/python3 -O dig.py ../tests/cohendiv.c -log 3 -noss -nrandinps 10 settings:INFO:2021-10-23 12:37:15.965808: dig.py ../tests/cohendiv.c -log 3 -noss -nrandinps 10 alg:INFO:analyzing '../tests/cohendiv.c' alg:INFO:analyzing '../tests/cohendiv.c' infer.eqt:WARNING:18 traces < 35 uks, reducing to deg 2 infer.eqt:WARNING:38 traces < 84 uks, reducing to deg 2 infer.eqt:WARNING:50 traces < 84 uks, reducing to deg 2 alg:INFO:testing 678 invs using 106 traces (0.30s) alg:INFO:simplify 670 invs (3.26s) vtrace1 (17 invs): 1. a*y - b == 0 2. q*y + r - x == 0 3. -a <= 0 4. -r <= 0 5. -y <= -9 6. a - b <= 0 7. a - q <= 0 8. r - x <= 0 9. q - x <= -6 10. b - x <= -2 11. -a - r <= -2 12. -x - y <= -16 13. min(q, r, x) - b <= 0 14. a + q === 0 (mod 2) 15. a - q === 0 (mod 2) 16. -a - q === 0 (mod 2) 17. -a + q === 0 (mod 2) vtrace2 (17 invs): 1. a*y - b == 0 2. q*y + r - x == 0 3. -q <= 0 4. -y <= -9 5. r - x <= 0 6. b - r <= 0 7. -b + y <= 0 8. -r + y <= -2 9. -q - x <= -12 10. min(a, b, q) - y - 1 <= 0 11. b + 2 - max(a, q, r, y) <= 0 12. q === 0 (mod 2) 13. -q === 0 (mod 2) 14. r - x === 0 (mod 4) 15. r + x === 0 (mod 2) 16. -r + x === 0 (mod 4) 17. -r - x === 0 (mod 2) vtrace3 (9 invs): 1. q*y + r - x == 0 2. -r <= 0 3. -q <= 0 4. -y <= -9 5. r - x <= 0 6. r - y <= -1 7. -q - x <= -6 8. -q - r <= -3 9. -x - y <= -16 ``` #### Other programs * The directory [`benchmark/c/nla`](./benchmark/c/nla) contains many programs having nonlinear invariants.

:wrench: Tweaking DIG

details DIG aims to be fully automatic. However, it also allows the user to control its behaviors (the `src/settings.py` lists all the defaut parameters). Use `-h` or `--help` to see options that can be passed into DIG. Below we show several useful ones. #### Specify max degree for equalities By default, DIG automatically to find equalities that can have high degrees (e.g., `x^7`). This can take time and so we can specify DIG to search for equalities no more than some maximum degree `X` using the option `-maxdeg X`. This will make DIG runs faster (with the cost of not able to find equalities with higher degrees than `X`). #### Disabling Invariants By default DIG searches for all supported forms of invariants. However, we can turn them off using `-noeqts`, `-noieqs` , `-nominmax`, `nocongruences` ```sh $ ~/miniconda3/bin/python3 -O dig.py ../tests/cohendiv.c -log 3 -maxdeg 2 -noieqs #find equalities up to degree 2 and do not infer inequalities ... ``` #### Customizing Inequalities By default, DIG infers octagonal inequalities (i.e., linear inequalities among `2` variables with coefs in in the set `{-1,0,1}`). We can customize DIG to find more expression inequalities (of course, with the trade-off that it takes more time to generate more expressive invs). Below we use a different example `Sqrt1.java` to demonstrate ```sh $ ~/miniconda3/bin/python3 -O dig.py ../benchmark/c/nla/sqrt1.c -nominmax -nocongruences # find default, octagonal, ieq's. ... 1. 2*a - t + 1 == 0 2. 4*s - t**2 - 2*t - 1 == 0 3. -a <= 0 4. a - n <= 0 5. -n + t <= 2 6. -s + t <= 0 $ ~/miniconda3/bin/python3 -O dig.py ../benchmark/c/nla/sqrt1.c -nominmax -nocongruences -ideg 2 # find nonlinear octagonal inequalities ... 1. 2*a - t + 1 == 0 2. 4*s - t**2 - 2*t - 1 == 0 3. -a <= 0 4. a - n <= 0 5. -s + t <= 0 6. -n + t <= 2 7. -s**2 + t**2 <= 0 $ ~/miniconda3/bin/python3 -O dig.py ../benchmark/c/nla/sqrt1.c -nominmax -nocongruences -icoefs 2 # find linear inequalities with coefs in {2,-1,0,1,2} ... 1. 2*a - t + 1 == 0 2. 4*s - t**2 - 2*t - 1 == 0 3. -a <= 0 4. a - n <= 0 5. -n + 2*t <= 6 6. -2*n + s <= 2 7. -2*s + 2*t <= 0 ``` ---

:question: FAQs

Information useful for researches (e.g., what makes DIG different than others?) > What is the input to DIG? - DIG takes as input a C program. This program must be compilable (i.e., syntactically correct) and is annnotated with locations of interest (where you want to infer invariants at). - DIG can also take as input a `csv` file consisting of program traces and it will infer invariants just over those traces (i.e., pure dynamic). > What are the dependencies for using DIG? - Python, Sympy, and the Z3 SMT solver for inferring invariants, an the symbolic execution tool CIVL for checking invariants from source code. Python, Sympy, and Z3 can be installed using the Miniconda distribution of Python. CIVL already comes with the DIG distrubition. > Do I need to tune DIG to infer invariants? - No, DIG should work out of the box and does not require user inputs. However, if you want to tweak the behavior of DIG, you can do so as shown [here](#wrench-tweaking-dig). > What kind of invariants are supported? - This DIG tool supports **numerical invariants**. This includes both nonlinear and linear (affine) properties. See programs and examples [here](./EXAMPLES.md) - Note many [research projects](#page_with_curl-publications) build upon DIG to support other kinds of invariants (e.g., ranking functions and recurrent sets for termination and non-termination analysis, or recurrence relations for complexity analysis). These projects have their own separate research prototype tools. > What makes DIG different from other invariant generation tools? - A good starting place to understanding DIG's technical details is our [TSE'21](https://dynaroars.github.io/pubs/nguyen2021using.pdf) paper. - Main purpose of DIG is to discover strongest possible invariants at desired locations, *not* to prove an assertion or post condition, which is the goal of many invariant tools. - Of course if the invariants found are stronger than the assertion or post condition, then those are proved - DIG infers invariants at arbitrary location and thus is not restricted to, e.g., inductive loop invariants - The input of DIG is a _program_, not SMT formulae representing transitions as in many invariant tools - Checking is done by extracting _symbolic states_ using _symbolic execution_ and applying Z3 SMT solver to reason about the states and candidate invariant.s - DIG's inferrence is dynamic (mostly), i.e., DIG *is* a data-driven approach - Some parts, e.g., inequalities, use static analysis by analyzing symbolic states - Does not use ML for inference (not neural networks, classifers, etc) - DIG follows an _iterative guess-and-check approach_, which infers candidate invs from traces, checks and obtains counterexample traces to improve inference, and repeats > How to to speed up DIG? - By default, DIG performs multiple algorithms to find different invariants and its nonlinear equality invariants can have very large degree, all of which contribute to large search space. To speed up DIG, you have several options - Use a computer with many cores. DIG leverages multiprocessing and can run significantly faster with a modern multicore computer. As an example, our [lab machine](https://github.com/dynaroars/dynaroars.github.io/wiki/Servers) has 64 cores. Of course you don't need that many, but the more, the better. - Note that DIG does not leverage GPU processing - Tweak its parameters as shown [here](#wrench-tweaking-dig). For example, reducing the number of degree to `d` (`-maxdeg d`) will tell DIG not to search for nonlinear invariants with degree more than `d` or disabling certain types of invariants if you're not intested in them (e.g., `-nominmax` to disable the computation of min/max properties) ---

:page_with_curl: Publications

BibTeX entries for citing DIG - Latest paper (TSE'21) on symbolic state and DIG ``` sh @article{nguyen2021using, title={Using symbolic states to infer numerical invariants}, author={Nguyen, Thanhvu and Nguyen, KimHao and Dwyer, Matthew B}, journal={IEEE Transactions on Software Engineering}, volume={48}, number={10}, pages={3877--3899}, year={2021}, publisher={IEEE} } ``` - Original DIG paper on inferring nonlinear numerical invariants (ICSE'12) ``` sh @inproceedings{nguyen2012using, title={Using dynamic analysis to discover polynomial and array invariants}, author={Nguyen, ThanhVu and Kapur, Deepak and Weimer, Westley and Forrest, Stephanie}, booktitle={2012 34th International Conference on Software Engineering (ICSE)}, pages={683--693}, year={2012}, organization={IEEE} } ```

Technical information about DIG and projects build upon DIG can be found from these papers. The tool paper (ICSE22) and Symbolic States paper (TSE21) are probably a good starting point.

  1. ThanhVu Nguyen, KimHao Nguyen, and Hai Duong. SymInfer: Inferring Numerical Invariants using Symbolic States. International Conference on Software Engineering-Tool Demo (ICSE-Demo), pages 197--201, 2022
  2. ThanhVu Nguyen, KimHao Nguyen, Matthew Dwyer. Using Symbolic States to Infer Numerical Invariants. Transactions on Software Engineering (TSE), 2021
  3. Didier Ishimwe, KimHao Nguyen, and ThanhVu Nguyen. Dynaplex: Analyzing Program Complexity using Dynamically Inferred Recurrence Relations, Proc. ACM Program. Lang. (OOPSLA), pages 1--23, 2021
  4. Ton Chanh Le, Timos Antonopoulos, Parisa Fathololumi, Eric Koskinen, ThanhVu Nguyen. DynamiTe: Dynamic Termination and Non-termination Proofs. Proc. ACM Program. Lang. (OOPSLA), 2020
  5. ThanhVu Nguyen, Didier Ishimwe, Alexey Malyshev, Timos Antonopoulos, and Quoc-Sang Phan. Using Dynamically Inferred Invariants to Analyze Program Runtime Complexity. Workshop on Software Security from Design to Deployment, 2020
  6. ThanhVu Nguyen, Matthew Dwyer, and William Visser. SymInfer: Inferring Program Invariants using Symbolic States. In Automated Software Engineering (ASE). IEEE, 2017.
  7. ThanhVu Nguyen, Timos Antopoulos, Andrew Ruef, and Michael Hicks. A Counterexample-guided Approach to Finding Numerical Invariants. In 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pages 605--615. ACM, 2017.
  8. ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest. DIG: A Dynamic Invariant Generator for Polynomial and Array Invariants. Transactions on Software Engineering Methodology (TOSEM), 23(4):30:1--30:30, 2014.
  9. ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest. Using Dynamic Analysis to Generate Disjunctive Invariants. In 36th International Conference on Software Engineering (ICSE), pages 608--619. IEEE, 2014.
  10. ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest. Using Dynamic Analysis to Discover Polynomial and Array Invariants. In 34th International Conference on Software Engineering (ICSE), pages 683--693. IEEE, 2012. Distinguish Paper Award

ACKNOWLEDGEMENTS