Antlr4 warm-up times - can this be improved/pre-computed ?

hzeller commented 4 years ago

We'd like to add Surelog to sv-tests, see this https://github.com/SymbiFlow/sv-tests/pull/447 pull request.

However, currently the tests are using more time that can be allocated by the continuous integration sysstem, so it is not merged yet. It seems the overall time for Surelog is dominated by start-up cost: it typically takes about 500 milliseconds (possibly due to the Antlr4 startup-time ?), while other binaries usually have 10ms start-up time.

Reproduce

To verify this, it is possible to get the merge request locally and run sv-tests with Surelog

git clone https://github.com/SymbiFlow/sv-tests.git

cd sv-tests
git fetch origin pull/447/head:surelog-test
git checkout surelog-test
git submodule update --init --recursive

# this submodule-init might require pressing return a couple of times as there is
# some glitch in some submodule (basejump) that can't be retrieved.
# It asks for username/password, but just pressing return is ok.

Now, optionally, install the build-system bazel if comparison with verible is desired https://docs.bazel.build/versions/1.2.0/install-ubuntu.html

# Build the tools we're interested in.
make -j10 -k surelog yosys verible

# Now run the tests and generate the reports
make generate-tests
make -j10 -k tests
make report

The results appear in out/report/index.html ; the total runtime of Yosys or Verible for all tests combined is in the order of 30 seconds while with all the start-up costs it is > 1000 seconds for Surelog.

Smaller tests are dominated by start-up time

Each of the small tests is dominated by the start-up time. Is there a way to have the Antlr4 'warm-up' status pre-generated and compiled into the system to minimize this ?

Finding performance optimization opportunities

The run-time is also recorded in the log-files directly, so it is possible to inspect these to look for low-hanging fruit of possible improvements:

find out/logs/Surelog/ -name "*.log" | xargs grep time_elapsed | awk '{printf("%8.3f %s\n", $2, $1); T+=$2} END{ printf("%8.3f Total time\n", T);}' | sort -n

The sv-tests also help to identify potentially low-hanging fruit of larger compilation times.

alainmarcel commented 4 years ago

Would a solution where all tests are combined into a single invocation and surelog spits out individual reports for all tests in their respective directories work? That would run in about 10 seconds total as this is what is done in the third_party/tests/Google/Google.sl (Right now Surelog clubs all the reports into one - which is what I am proposing to fix here).

hzeller commented 4 years ago

The sv-tests are much easier dealt with, and are more realistic, if we only combine the things that are supposed to be together (such as in the cores). Start-up times are a legitimate concern in huge compilations, so we should try to address them.

alainmarcel commented 4 years ago

In this article: https://groups.google.com/forum/#!topic/antlr-discussion/q-8MPVI9lrw The Antlr C++ target author does not give very promising statements about speeding up the warmup time:

"I'm not sure we can do much about the memory consumption, it's dictated by the way the runtime stores its data. For the parsing time: did you try multiple parse runs in a row? ANTLR4 has a significant warmup phase,(e.g. for my test suite warm up time is 6s, while all following runs take only ~0.8s. And I have seen much worse numbers (I wrote something about this in an earlier mail here). For a big expression query the warmup is ~8s while all following runs only take ~10ms (so the relation is almost 1000:1)."

In real Verilog designs, file size is usually large in average, so the warm-up time is not as visible as for tiny unit tests. In Surelog I club small files together in threads or multi-processes so they share the warm-up time, that works fine if either the number of files is large, or the files are large themselves.

For the sv-test type setup: large number of small files, I think my suggestion (Yet to be implemented) is the only way to go.

A possible implementation is Surelog takes one instruction file which consists of all the "mini projects specifications" (The verilog command line - the output log file directory and name) and processes the whole lot in a batch fashion. That should work for the whole regression in 30s.

hzeller commented 4 years ago

I think we should fix the root problem instead of trying to change the tests to hide the problems. We should find out what constitutes the warm-up time and fix it in the antlr4 runtime. It sounds like it does some computation once and then is fast; if so this computation should be done at the time the parser is generated (LALR bison parsers for instance have the whole state-machine compiled in, there is no startup-time or memory allocation needed at runtime. A LL(k) parser would still need stack as memory at runtime, but grammar related stuff should be possible to pre-compute).

alainmarcel commented 4 years ago

Another solution is to have a server mode for Surelog. Surelog stays awake the whole test time and there is only one warm-up, the test would not need to be changed.

We can also talk to Mike, the author. I don't know how to do with Antlr what you are asking for myself.

hzeller commented 4 years ago

Server mode sounds like a complicated and error-prone way to work around the root-cause. Let's talk to Mike and figure out what is happening at start-up time, then we can fix it.

alainmarcel commented 4 years ago

Done: https://groups.google.com/forum/#!topic/antlr-discussion/Zhq3F7uHWFM

alainmarcel commented 4 years ago

In the meantime I have added a batch mode capability. Usage:

Run the following script: tests/create_batch_script.tcl at the top of the directory structure containing the unit tests, this produces a file called batch.txt
Run surelog -batch batch.txt in the same directory.

It should run under 20s for all the sv-tests unit tests.

alainmarcel commented 4 years ago

If executed at the root of sv-tests, it performs in 4 minutes as it also processes uvm and other third_party cases.

Processed 1085 tests. [ FATAL] : 0 [ SYNTAX] : 84 [ ERROR] : 3025 [WARNING] : 36 [ NOTE] : 0 Command exited with non-zero status 7 229.84user 1.58system 3:51.87elapsed 99%CPU (0avgtext+0avgdata 1839600maxresident)k 44584inputs+39992outputs (161major+743841minor)pagefaults 0swaps alain@alain-MacBook:~/sv-tests$

chipsalliance / Surelog