Add souffle compilation to docker build phase

MatthiasEgli-chainsecurity commented 5 years ago

Currently for each contract being analyzed, a new souffle is compiled, which slows down the process a lot. This step should be done once as part of the docker image build process.

f4z3r commented 5 years ago

I'll sync with Luca as I am pretty sure he was thinking of adding this to the Java API for the datalog pattern generation. Either way this will heavily depend on his work I suppose.

MatthiasEgli-chainsecurity commented 5 years ago

? I am only talking about the "make souffle" part.

f4z3r commented 5 years ago

Oh ok I thought you meant a souffle executable of the Datalog program. Ok sure I'll take care of it.

f4z3r commented 5 years ago

I am not sure what you mean. I checked and souffle is fully built at build time. You can verify this by running the docker in interactive mode:

securify (jakob_souffle_build) > docker run -it --entrypoint /bin/bash securify
root@1f37017bdc80:/# souffle
============================================================================
souffle -- A datalog engine.
Usage: souffle [OPTION] FILE.
----------------------------------------------------------------------------
Options:
    -F<DIR>             --fact-dir=<DIR>                    Specify directory for fact files.
    -I<DIR>             --include-dir=<DIR>                 Specify directory for include files.
    -D<DIR>             --output-dir=<DIR>                  Specify directory for output files (if <DIR> is -, stdout is used).
    -j<N>               --jobs=<N>                          Run interpreter/compiler in parallel using N threads, N=auto for system default.
    -c                  --compile                           Generate C++ source code, compile to a binary executable, then run this executable.
    -g<FILE>            --generate=<FILE>                   Generate C++ source code for the given Datalog program and write it to <FILE>.
    -w                  --no-warn                           Disable warnings.
    -m<RELATIONS>       --magic-transform=<RELATIONS>       Enable magic set transformation changes on the given relations, use '*' for all.
    -o<FILE>            --dl-program=<FILE>                 Generate C++ source code, written to <FILE>, and compile this to a binary executable (without executing it).
    -l                  --live-profile                      Enable live profiling.
    -p<FILE>            --profile=<FILE>                    Enable profiling, and write profile data to <FILE>.
    -r<FILE>            --debug-report=<FILE>               Write HTML debug report to <FILE>.
    -t<EXPLAIN>         --provenance=<EXPLAIN>              Enable provenance information via guided SLD.
    -d<type>            --data-structure=<type>             Specify data structure (brie/btree/eqrel/rbtset/hashset).
    -e<[ file | mpi ]>  --engine=<[ file | mpi ]>           Specify communication engine for distributed execution.
                        --hostfile=<FILE>                   Specify --hostfile option for call to mpiexec when using mpi as execution engine.
    -v                  --verbose                           Verbose output.
    -h                  --help                              Display this help message.
----------------------------------------------------------------------------
Version: 1.4.0
----------------------------------------------------------------------------
Copyright (c) 2016-18 The Souffle Developers.
Copyright (c) 2013-16 Oracle and/or its affiliates.
All rights reserved.
============================================================================
root@1f37017bdc80:/# exit
exit

If what you meant was indeed to compile a Datalog executable only once at build time, I have to sync with Luca because his Java to Datalog rule translation must determine when a new executable needs to be compiled. The reasoning here is that you will need to recompile each time a new rule is added/removed via the Java API.

ritzdorf commented 5 years ago

@jakobbeckmann thanks, but how much overhead would it be to change the current system (without Luca's changes) to only compile the Datalog executable once? Luca's changes might still take some time and in that case we would already have the speedup now.

f4z3r commented 5 years ago

@ritzdorf I have not looked at the Java/Datalog code in depth for now, so I can't say for sure. However, if no rules are injected at runtime, it should be relatively trivial. If however the Java executable injects facts and rules at runtime, then it depends on how easy it is to change this to not be performed at runtime. I don't see a reason for injecting rules at runtime so in my opinion it could be done relatively easily. I would only need to call a compilation command from the Dockerfile, change the Java executable to inject the facts into separate files instead of the program file, and ensure it calls the existing executable created at build time (and slightly change the Datalog program to pull external facts).

Unfortunately I have no time today or tomorrow, but I will try to get it done this weekend. I'll keep you updated if large changes would be required to the Java executable, so we can discuss this in more detail maybe.

eth-sri / securify

Add souffle compilation to docker build phase #46