AliveToolkit / alive2

Automatic verification of LLVM optimizations
MIT License
746 stars 95 forks source link

next steps for parallel clang plugin #614

Open regehr opened 3 years ago

regehr commented 3 years ago

I just want to jot down a few thoughts before I forget them. I don't plan to work on these items immediately, there's more important stuff to do. anyway, here are what I think are the next items for parallel alive2:

the key thing here is that these implementation tasks are separate.

regehr commented 3 years ago

I'll just add that both of these tasks should be fairly easy, given the infrastructure already in place.

batching has two awkward aspects. first, it will be slightly tricky to make the TV report the same as the sequential version. probably the way to do this is for a child process that is going to do N TV jobs inherits N pipe ends. this would allow almost all of the current machinery to work as-is. second, batching will interact badly with a pull request I submitted today which destroys a child process after a timeout expires. I'm not sure what the solution is, but we can worry about it later.

regehr commented 3 years ago

Well, ugh, as I build more projects I'm starting to see deadlocks from the jobserver implementation. I knew there was potential for this, but I was hoping it wouldn't happen. I believe it is an unavoidable consequence of:

For now I'm using the unrestricted parallel implementation, which doesn't try to kill my machine as much as I'd have expected :).

But I suppose I'll have to do the pipe/semaphore thing sooner rather than later. @nunoplopes please let me know what you think of this design. You run:

alive-manager -j10 make -j10 CC=alivecc

Now, alive-manager (let me know if anyone has a better name) creates the named pipe and lets make (and its sub-processes) know what it is called using an environment variable. The manager process can exit on its own once "make" finishes.

This isn't lovely, and it gives up global coordination between make-level parallelism and alive-level parallelism, but I think it removes deadlocks (by segregating the two parallel resources) and it does not seem horribly inconvenient to use.

nunoplopes commented 3 years ago

I would love if the manager could be run independently as well (without it having to fork the child processes). This enables scenarios where we compile distinct programs and still achieve global coordination without having a single Makefile for all programs. The advantage of typing the command as you did is that the manager is shutdown when no longer needed, while the "daemon" version need to be shutdown manually. I think we can support both. If you want to be fancy, you can pass the second -j10 to make automatically; no need to duplicate it.

regehr commented 3 years ago

I think this is all easy and I'll do it at some point. doesn't seem super high priority since for now the unrestricted parallel manager seems to work fine -- it does overload the machine sometimes, but in this case the scheduler just round-robins. I've not yet had my machine run out of RAM.

in standalone mode, we could either make the named pipe use a well-known name, in which case you can only run one manager at once, or else it could print some environment variables that you would need to set in other shells, so that new TV processes know who to talk to. nothing hard about any of it.

regehr commented 3 years ago

Just to update and summarize a few things that I've said in emails (plus a few new thoughts), I consider our current challenge to be TV for the MultiSource directory of the LLVM test-suite repo. It contains like 600k lines of C including some very tough targets like sqlite3.c. Even my 128 thread / 256 GB machine can't make a lot of headway at present.

Summarizing our lines of attack:

regehr commented 3 years ago

ok, I am asking three questions about our clang plugin:

  1. What are the sequential bottlenecks besides llvm2alive?
  2. Are we calling llvm2alive too many times?
  3. How can we make llvm2alive faster?

Juneyoung's pull req https://github.com/AliveToolkit/alive2/pull/637 should address item 1, but I'll do some testing to make sure, once it lands.

Item 2 is interesting and a bit surprising. I modified tv.cpp so that it maintains a history of function strings it has seen. We call it a cache hit when llvm2alive produces a stringified function that it has produced before, and a cache miss otherwise. This is what I get for a smallish C file that I have sitting around:

regehr@home:~/rb_tree_demo$ grep CACHE out.txt  | sort | uniq -c
  16522 CACHE_HIT
     98 CACHE_MISS
regehr@home:~/rb_tree_demo$ 

So there is clearly an opportunity for a lot of speedup here. I'm not yet sure what is the best way to do this, though.

I have not looked into item 3 yet.

regehr commented 3 years ago

it's possible that we can use the code in llvm/lib/IR/StructuralHash.cpp to avoid repeating work