facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
906 stars 127 forks source link

Reproduce baseline in llvm enviornment #733

Open dcy11011 opened 2 years ago

dcy11011 commented 2 years ago

❓ Questions and Help

I'm trying to do some imitation learning with CompilerGym's llvm envornment. I want to get the passes taken by optlevel -Oz in llvm, then input them to CompilerGym, and hopefully on all benchmarks this can always produce 1.0 reward on reward space IrInstructionCountOz so that my RL algorithm can learn from this action sequence. I try to get the passes of -Oz level by:

 opt -Oz -S --debug-pass=Arguments test.ll

which produce:

Pass Arguments:  -tti -tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -verify -ee-instrument -simplifycfg -domtree -sroa -early-cse -lower-expect
Pass Arguments:  -targetlibinfo -tti -targetpassconfig -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -called-value-propagation -attributor -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -basicaa -aa -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -opt-remark-emitter -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -memoryssa -licm -loop-unswitch -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -indvars -loop-idiom -loop-deletion -loop-unroll -mldst-motion -phi-values -basicaa -aa -memdep -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -gvn -phi-values -basicaa -aa -memdep -memcpyopt -sccp -demanded-bits -bdce -basicaa -aa -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -lazy-value-info -jump-threading -correlated-propagation -basicaa -aa -phi-values -memdep -dse -basicaa -aa -memoryssa -loops -loop-simplify -lcssa-verification -lcssa -scalar-evolution -licm -postdomtree -adce -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -barrier -elim-avail-extern -basiccg -rpo-functionattrs -globalopt -globaldce -basiccg -globals-aa -domtree -float2int -lower-constant-intrinsics -domtree -loops -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -loop-rotate -loop-accesses -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-distribute -branch-prob -block-freq -scalar-evolution -basicaa -aa -loop-accesses -demanded-bits -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -loop-vectorize -loop-simplify -scalar-evolution -aa -loop-accesses -lazy-branch-prob -lazy-block-freq -loop-load-elim -basicaa -aa -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -domtree -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -loop-simplify -lcssa-verification -lcssa -scalar-evolution -loop-unroll -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -memoryssa -loop-simplify -lcssa-verification -lcssa -scalar-evolution -licm -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -transform-warning -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -domtree -loops -branch-prob -block-freq -loop-simplify -lcssa-verification -lcssa -basicaa -aa -scalar-evolution -block-freq -loop-sink -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instsimplify -div-rem-pairs -simplifycfg -verify -print-module
Pass Arguments:  -domtree
Pass Arguments:  -targetlibinfo -domtree -loops -branch-prob -block-freq
Pass Arguments:  -targetlibinfo -domtree -loops -branch-prob -block-freq

I extract the passes allowed in llvm-v0's action space in the above output and execute them step by step in CompilerGym. However on almost every benchmark this can not produce the same instruction count as -Oz. I also tried to pass these commands to opt tool, and it didn't work same as -Oz either. As a beginnner learning LLVM, I wonder if it is possible to achieve the same effect as -Oz by passing a series of transform passes to opt tool of llvm. If it is possible, how to do it with CompilerGym to get a series of actions that always produce exactly 1.0 reward?

Besides, I found that if I write the bitcode of a benchmark to file, then optimize it using -Oz and read it back, I will get a different instruction count compared to -Oz baseline computed by CompilerGym. Here's my test code:

with gym.make("llvm-v0") as env: 
    for i in range(100):
        env.reset(benchmark = f"generator://csmith-v0/{i}")
        print(f"csmith {i} compiler_gym baseline:", env.observation["IrInstructionCountOz"])

        env.write_bitcode(f"csmith-{i}.bc")
        subprocess.call(f"opt -Oz csmith-{i}.bc -o csmith-out-{i}.bc".split(" "))
        env.reset(benchmark = make_benchmark(f"csmith-out-{i}.bc"))
        print(f"csmith {i} opt Oz result:        ", env.observation["IrInstructionCount"], env.observation["IrInstructionCountOz"])

The result is:

csmith 0 compiler_gym baseline: 851
csmith 0 opt Oz result:         875 864
csmith 1 compiler_gym baseline: 666
csmith 1 opt Oz result:         808 769
csmith 2 compiler_gym baseline: 1896
csmith 2 opt Oz result:         1938 2106
csmith 3 compiler_gym baseline: 734
csmith 3 opt Oz result:         878 839

Why would this happen?

I noticed that CompilerGym seems to use llvm source code instead of calling opt tool to run the baseline. Will these two methods produce different results?

Additional Context

my llvm version: 10.0.0 my CompilerGym version: 0.2.4

ChrisCummins commented 2 years ago

Hi @dcy11011, this is a good question. Unfortunately there isn't a way to directly convert what the InstructionCountOz observation does to a list of actions that your agent can perform, and it may deviate slightly from the opt tool. The best thing would be to step through this code:

https://github.com/facebookresearch/CompilerGym/blob/edcf57c2c207ff8556c60de42dea1e2fb639b08c/compiler_gym/envs/llvm/service/Cost.cc#L266-L284

and to make a list of the passes that are run and convert that into a list of actions. BTW, this would be a great feature to have in CompilerGym if you do this, patches welcome! 🙂

Cheers, Chris

dcy11011 commented 2 years ago

Thanks a lot for your advice! I checked the code you mentioned and made a pass list to immitate what these llvm functions do. It performs much better that the one i get from opt --debug-pass=Arguments but still can't produce exactly 1.0 reward.

I noticed that many passes need parameters and they are given different parameters in different places in populateFunctionPassManager and populateModulePassManager. However actions in CompilerGym can only be converted to passes built by default parameters. I guess these parameters may have some influence on the result of optimazation, which cause my actions can not get the same result as Oz. I'm not sure if my guess is correct since now I only read the code and didn't debug CompilerGym to see the exact passes list generated. I'll do a more careful check on this.

ChrisCummins commented 2 years ago

Thanks a lot for your advice! I checked the code you mentioned and made a pass list to immitate what these llvm functions do. It performs much better that the one i get from opt --debug-pass=Arguments but still can't produce exactly 1.0 reward.

That's great! Could you share the list of actions here?

I noticed that many passes need parameters and they are given different parameters in different places in populateFunctionPassManager and populateModulePassManager. However actions in CompilerGym can only be converted to passes built by default parameters.

Yes we do not yet support parameterizing passes. That would be an interesting extension, though would require some thought as to how to best expose that through the action space.

I guess these parameters may have some influence on the result of optimazation, which cause my actions can not get the same result as Oz. I'm not sure if my guess is correct since now I only read the code and didn't debug CompilerGym to see the exact passes list generated. I'll do a more careful check on this.

Keep us posted 🙂

Cheers, Chris

dcy11011 commented 2 years ago

That's great! Could you share the list of actions here?

Sure, but I'm sorry that I found I mistakenly took the result of my experimental removal of loop-rotate as the result of the original list of actions. I tested the new action list (which obtained by reading the code) again and found it is actually very similar to the previous one (which obtained by opt tool in my first post). Its optimization effect is also similar to previous one (I tested it on first 50 benchmark of csmith dataset, and geometry mean of IrInstructionCountOz/IrInstructionCount is 0.804 on my new list, compare to 0.808 on the previous one). Here's the new action list:

-ee-instrument -simplifycfg -sroa -early-cse -lower-expect
-forceattrs -inferattrs -ipsccp -called-value-propagation -attributor -globalopt -mem2reg -deadargelim -instcombine -simplifycfg -prune-eh -inline -functionattrs -sroa -early-cse -speculative-execution -jump-threading -correlated-propagation -simplifycfg -instcombine -tailcallelim -simplifycfg -reassociate -loop-rotate -licm -loop-unswitch -simplifycfg -instcombine -indvars -loop-idiom -loop-deletion -loop-unroll -mldst-motion -gvn -memcpyopt -sccp -bdce -instcombine -jump-threading -correlated-propagation -dse -licm -adce -simplifycfg -instcombine -barrier -elim-avail-extern -rpo-functionattrs -globalopt -globaldce -float2int -lower-constant-intrinsics -loop-rotate -loop-distribute -loop-vectorize -loop-load-elim -simplifycfg -loop-unroll -instcombine -licm -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -loop-sink -instsimplify -div-rem-pairs -simplifycfg

The first line correspond to passes built by populateFunctionPassManager, and the second correspond to populateModulePassManager. I removed all the analysis passes and utility passes because they cannot be converted into actions and seem to be generated automatically when the transform passes is built. Compre to the passes given by opt (removed analysis&utility passes), it has no loop-simplify and lcssa and several fewer instcombine and replaced -early-cse-memssa with -early-csa.

(I'm still not sure if this result is correct since it all comes from code reading)