facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
898 stars 125 forks source link

An Example How to Implement Fork Inside CompilationSession #635

Open vladaindjic opened 2 years ago

vladaindjic commented 2 years ago

❓ Questions and Help

Right now, I am trying to apply a greedy search from llvm_autotuners packages over a CompilationSession defined by myself. However, I couldn't find any good example of how to implement forking inside the session class. No matter what I have tried (instantiation by using constructor or deepcopy), the following error always appears: SessionNotFound.

Could you provide any examples of how this is implemented inside llvm-v0 compilation session? I was searching for the function with this signature def fork(self) -> "CompilationSession", but no results.

Thanks in advance!

ChrisCummins commented 2 years ago

Hi @vladaindjic, good question. We don't have an implementation in Python. The LLVM environment's fork operator is implemented here:

https://github.com/facebookresearch/CompilerGym/blob/development/compiler_gym/envs/llvm/service/LlvmSession.cc#L101-L105

For python, you should okay by just implementing this method:

https://github.com/facebookresearch/CompilerGym/blob/development/compiler_gym/service/compilation_session.py#L81

Is that what you've done? The link to your code doesn't work for me (maybe the repo is private?)

Cheers, Chris

vladaindjic commented 2 years ago

Hi, @ChrisCummins ,

Thank you very much for your response. Sorry for the wrong link, the repo is private indeed, but I didn't know until now.

I am not entirely sure what the init function does, but as I could understand, it doesn't create a new instance of the CompilationSession?

I tried to implement the fork function you reference in the following naive way (deepcopy the current one):

class HPCToolkiCompilationSession(CompilationSession)  
  def fork(self) -> CompilationSession:
          import copy
          new_env = copy.deepcopy(self)
          return new_env

When env.step() function is called, NotSessionFound is arised. However, if env.reset() is called before the step, then everything is ok. I guess resetting does something that I'm missing. Do you have any idea what that would be?

And one more question about reset. The docstring says the following about reset: "This method must be called before :func:step().". I am wondering how the autotuner works even though it doesn't call reset before step.

Best, Vladimir

ChrisCummins commented 2 years ago

Hi @vladaindjic,

I am not entirely sure what the init function does, but as I could understand, it doesn't create a new instance of the CompilationSession?

For Python, the __init__ method is responsible for creating a new session and setting everything up. For C++, we move the heavy lifting out of the constructor and into init(). The idea is this enables init() to return an error status if it fails.

I tried to implement the fork function you reference in the following naive way (deepcopy the current one):

class HPCToolkiCompilationSession(CompilationSession)  
  def fork(self) -> CompilationSession:
          import copy
          new_env = copy.deepcopy(self)
          return new_env

Hmm. Could you try constructing the new_env object and then copying over any mutable state? Hard to debug without a full code sample.

And one more question about reset. The docstring says the following about reset: "This method must be called before :func:step().". I am wondering how the autotuner works even though it doesn't call reset before step.

That sounds like a bug. Everything should call reset() first. Could you send me a link?

Cheers, Chris

vladaindjic commented 2 years ago

Hi @ChrisCummins

I am not entirely sure what the init function does, but as I could understand, it doesn't create a new instance of the CompilationSession?

For Python, the __init__ method is responsible for creating a new session and setting everything up. For C++, we move the heavy lifting out of the constructor and into init(). The idea is this enables init() to return an error status if it fails.

Thanks for the explanation. I was wondering what init in C++ does over the __init__ in Python and you answered it.

I tried to implement the fork function you reference in the following naive way (deepcopy the current one):

class HPCToolkiCompilationSession(CompilationSession)  
  def fork(self) -> CompilationSession:
          import copy
          new_env = copy.deepcopy(self)
          return new_env

Hmm. Could you try constructing the new_env object and then copying over any mutable state? Hard to debug without a full code sample.

I'll try this and provide the link to the implementation.

And one more question about reset. The docstring says the following about reset: "This method must be called before :func:step().". I am wondering how the autotuner works even though it doesn't call reset before step.

That sounds like a bug. Everything should call reset() first. Could you send me a link?

Hm, maybe this wasn't about the reset and the step, but I found a weird behaviour when trying llvm_autotuners. I tried running random search over the bench dataset for a few seconds.

python -m llvm_autotuning.tune -m  experiment=my-experiment  outputs=/tmp/logs  num_replicas=1  autotuner=random  autotuner.optimization_target=codesize  autotuner.search_time_seconds=5

Then I generated the .csv with all results with:

python -m llvm_autotuning.info --log-dirs /tmp/logs/my-experiment/...

The content of the .csv file is here:

benchmark,reward,walltime,commandline
benchmark://cbench-v1/qsort,1.125,8.84742522239685,opt -early-cse-memssa -rewrite-statepoints-for-gc -instsimplify -mem2reg -gvn-hoist -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.125,8.33852744102478,opt -loop-reroll -argpromotion -insert-gcov-profiling -loop-deletion -redundant-dbg-inst-elim -cross-dso-cfi -break-crit-edges -partially-inline-libcalls -bdce -deadargelim -mem2reg -lower-widenable-condition -simple-loop-unswitch -attributor -slp-vectorizer -loop-unswitch -reassociate -loop-vectorize -mem2reg -break-crit-edges -reassociate -slp-vectorizer -loop-interchange -loop-unswitch -gvn -libcalls-shrinkwrap -lowerswitch -ipconstprop -aggressive-instcombine -loop-sink -simplifycfg -instcombine input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,9.05691146850586,opt -loop-versioning -sccp -bdce -constmerge -loop-versioning-licm -loop-data-prefetch -adce -coro-early -instsimplify -functionattrs -pgo-memop-opt -inject-tli-mappings -constmerge -add-discriminators -correlated-propagation -reassociate -simplifycfg -mem2reg -always-inline -irce -adce -loop-load-elim -jump-threading -loop-versioning-licm -separate-const-offset-from-gep -lcssa -globaldce -called-value-propagation -nary-reassociate -cross-dso-cfi -sink -loop-vectorize -loop-sink -gvn -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,8.452608346939087,opt -loop-versioning -loop-load-elim -nary-reassociate -bdce -instnamer -loop-vectorize -infer-address-spaces -loop-unroll-and-jam -prune-eh -instnamer -slp-vectorizer -always-inline -constmerge -functionattrs -gvn-hoist -loop-interchange -hotcoldsplit -correlated-propagation -licm -loop-sink -lower-widenable-condition -sccp -simple-loop-unswitch -gvn -partially-inline-libcalls -rpo-functionattrs -globaldce -loop-reroll -forceattrs -strip -loop-distribute -strip-nondebug -deadargelim -ipconstprop -simplifycfg -loop-guard-widening -instcombine input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1130742049469964,8.411367416381836,opt -globalopt -infer-address-spaces -globalopt -barrier -loop-idiom -ipsccp -ee-instrument -loop-sink -globalsplit -loop-load-elim -mergereturn -loop-simplifycfg -simple-loop-unswitch -globaldce -sroa -memcpyopt -libcalls-shrinkwrap -strip-nondebug -gvn -simplifycfg -lowerinvoke -gvn -mergefunc -coro-elide -globalsplit -mergereturn -instcombine input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1130742049469964,8.333942413330078,opt -loop-instsimplify -loop-distribute -libcalls-shrinkwrap -dce -inferattrs -instnamer -inject-tli-mappings -loop-versioning-licm -loop-deletion -scalarizer -irce -lowerinvoke -jump-threading -reassociate -strip-nondebug -loop-predication -strip-dead-prototypes -guard-widening -scalarizer -loop-sink -mergeicmps -sink -mem2reg -pgo-memop-opt -jump-threading -instnamer -newgvn -float2int -prune-eh -correlated-propagation -functionattrs -callsite-splitting -coro-early -loop-load-elim -mergeicmps -instcombine -lower-widenable-condition -globaldce -constmerge -guard-widening -dse -correlated-propagation -simple-loop-unswitch -mergeicmps -loweratomic -bdce -loop-interchange -consthoist -dse -aggressive-instcombine -dce -libcalls-shrinkwrap -loop-distribute -instnamer -correlated-propagation -sroa -mergefunc -indvars -newgvn -licm -sink -early-cse-memssa -lower-guard-intrinsic -instsimplify -slsr -constmerge -coro-elide -licm -functionattrs -lowerswitch -correlated-propagation -elim-avail-extern -loop-idiom -coro-elide -early-cse-memssa -ee-instrument -sink -bdce -strip -load-store-vectorizer -rewrite-statepoints-for-gc -lowerinvoke -correlated-propagation -loop-unroll-and-jam -prune-eh -bdce -early-cse -early-cse -bdce -lower-widenable-condition -jump-threading input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,9.032959461212158,opt -speculative-execution -loop-versioning-licm -elim-avail-extern -mem2reg -early-cse -globalsplit -tailcallelim -strip-dead-prototypes -simple-loop-unswitch -instnamer -insert-gcov-profiling -newgvn -rewrite-statepoints-for-gc -sancov -div-rem-pairs -break-crit-edges -sccp -lower-matrix-intrinsics -lowerinvoke -bdce -loop-load-elim -dce -consthoist -flattencfg -rpo-functionattrs -sancov -cross-dso-cfi -insert-gcov-profiling -loop-versioning -post-inline-ee-instrument -bdce -adce -always-inline -dse -loop-reroll -add-discriminators -add-discriminators -loop-idiom -rpo-functionattrs -loop-predication -simplifycfg -jump-threading -mldst-motion -add-discriminators -memcpyopt -globaldce -scalarizer -early-cse-memssa input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,8.407591819763184,opt -insert-gcov-profiling -lowerswitch -float2int -redundant-dbg-inst-elim -memcpyopt -lower-guard-intrinsic -loop-unroll -globaldce -loop-data-prefetch -infer-address-spaces -newgvn -infer-address-spaces -early-cse -scalarizer -loop-unswitch -always-inline -deadargelim -mldst-motion -loop-idiom -ee-instrument -ipsccp -coro-elide -canonicalize-aliases -gvn-hoist -loop-unroll -loop-simplifycfg -early-cse-memssa -correlated-propagation -lower-widenable-condition -instnamer -constprop -loop-deletion -float2int -gvn-hoist -argpromotion -mem2reg -coro-split -inject-tli-mappings -cross-dso-cfi -coro-split -instcombine -loop-unswitch -break-crit-edges -loop-unroll-and-jam -loop-fusion -jump-threading -bdce -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1170212765957446,8.429417371749878,opt -jump-threading -mem2reg -functionattrs -sccp -consthoist -slp-vectorizer -forceattrs -early-cse -strip-dead-prototypes -sroa -bdce -infer-address-spaces -slsr -mergefunc -mldst-motion -bdce -argpromotion -globaldce -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,9.100965738296509,opt -loop-simplify -coro-split -scalarizer -strip-debug-declare -instnamer -coro-cleanup -ipconstprop -coro-split -sancov -speculative-execution -slsr -lower-matrix-intrinsics -forceattrs -break-crit-edges -loweratomic -div-rem-pairs -prune-eh -attributor -lower-constant-intrinsics -infer-address-spaces -lowerswitch -redundant-dbg-inst-elim -loop-simplify -loop-unswitch -early-cse-memssa -always-inline -mldst-motion -mem2reg -strip-dead-prototypes -correlated-propagation -mergeicmps -pgo-memop-opt -coro-early -insert-gcov-profiling -strip-nondebug -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1209964412811388,8.781571865081787,opt -div-rem-pairs -early-cse-memssa -rpo-functionattrs -insert-gcov-profiling -barrier -adce -inject-tli-mappings -argpromotion -coro-elide -loop-sink -lower-matrix-intrinsics -loop-load-elim -partially-inline-libcalls -separate-const-offset-from-gep -break-crit-edges -constmerge -attributor -irce -aggressive-instcombine -ipsccp -die -mem2reg -forceattrs -early-cse -post-inline-ee-instrument -pgo-memop-opt -partial-inliner -mergefunc -instsimplify -ee-instrument -dse -globalsplit -separate-const-offset-from-gep -canonicalize-aliases -loop-vectorize -simplifycfg -memcpyopt -div-rem-pairs -guard-widening -ipsccp -nary-reassociate -lower-expect -lower-expect -reassociate -slsr -load-store-vectorizer -instsimplify -loop-interchange -jump-threading -loop-unswitch -coro-cleanup -loop-distribute -memcpyopt -deadargelim -reassociate -loop-versioning -redundant-dbg-inst-elim -loop-simplify -partially-inline-libcalls -early-cse -ee-instrument -globaldce -loweratomic -scalarizer -infer-address-spaces -instcombine -irce -loop-deletion -sroa -coro-early -loop-distribute -early-cse -die -loop-guard-widening -licm -alignment-from-assumptions -loop-reroll -loop-vectorize -sccp -loop-versioning -instcombine -indvars -loop-fusion -strip-dead-prototypes -deadargelim -globaldce -always-inline -jump-threading -instcombine input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1130742049469964,8.34814167022705,opt -reassociate -ipconstprop -loop-guard-widening -loop-predication -deadargelim -load-store-vectorizer -consthoist -canonicalize-aliases -inject-tli-mappings -reassociate -constprop -strip -sink -sroa -simplifycfg -infer-address-spaces -early-cse-memssa input.bc -o output.bc
benchmark://cbench-v1/qsort,1.0899653979238757,9.41271162033081,opt -cross-dso-cfi -loop-instsimplify -coro-split -rpo-functionattrs -slsr -indvars -argpromotion -globalsplit -strip-nondebug -alignment-from-assumptions -early-cse -instnamer -jump-threading -partially-inline-libcalls -ipsccp -lower-matrix-intrinsics -lower-expect -globaldce -mem2reg -deadargelim -inject-tli-mappings -ee-instrument -tailcallelim -newgvn -strip -loop-deletion -strip-nondebug -mergeicmps -elim-avail-extern -loop-vectorize -elim-avail-extern -barrier -mldst-motion -strip-debug-declare -float2int -consthoist -simplifycfg -indvars -coro-cleanup -functionattrs -loop-idiom -loop-versioning -aggressive-instcombine -tailcallelim -instsimplify -gvn-hoist -loop-simplifycfg -mergefunc -sancov -simplifycfg -licm -mldst-motion -lowerswitch -loop-simplifycfg -constprop -libcalls-shrinkwrap -irce -loop-guard-widening -instsimplify -loop-unswitch -aggressive-instcombine -dce -rewrite-statepoints-for-gc -nary-reassociate -deadargelim -loop-load-elim -ee-instrument -loop-simplify -instnamer -loop-distribute -loop-instsimplify -inject-tli-mappings -newgvn input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1091549295774648,8.96728229522705,opt -loop-rotate -callsite-splitting -loop-vectorize -libcalls-shrinkwrap -lcssa -callsite-splitting -loop-instsimplify -loop-simplify -prune-eh -early-cse-memssa -early-cse -loop-unroll-and-jam -deadargelim -strip -constprop -globalopt -strip-debug-declare -loop-predication -ee-instrument -barrier -globalsplit -sroa -instcombine -reassociate -strip -simplifycfg -instsimplify -barrier -flattencfg -argpromotion -strip-debug-declare -tailcallelim -deadargelim -licm -ipsccp -called-value-propagation -rpo-functionattrs -strip-dead-prototypes -strip -strip-nondebug -argpromotion -loweratomic -slp-vectorizer -loop-unswitch -licm -loop-data-prefetch -simplifycfg -functionattrs -constmerge -early-cse -nary-reassociate -functionattrs -newgvn input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,8.396355152130127,opt -speculative-execution -attributor -mldst-motion -pgo-memop-opt -bdce -load-store-vectorizer -pgo-memop-opt -loop-idiom -alignment-from-assumptions -separate-const-offset-from-gep -loop-distribute -ipsccp -canonicalize-aliases -aggressive-instcombine -adce -loop-load-elim -loop-versioning -sroa -loop-load-elim -forceattrs -correlated-propagation -loop-unroll -slp-vectorizer -die -loop-versioning-licm -instnamer -insert-gcov-profiling -loop-versioning -loop-unroll -inferattrs -die -nary-reassociate -inject-tli-mappings -libcalls-shrinkwrap -sroa -div-rem-pairs -strip-nondebug -bdce -speculative-execution -lower-widenable-condition -loop-interchange -aggressive-instcombine -gvn -add-discriminators -flattencfg -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,9.209820985794067,opt -add-discriminators -loop-simplifycfg -bdce -sroa -lower-matrix-intrinsics -load-store-vectorizer -hotcoldsplit -early-cse-memssa -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1209964412811388,8.964505195617676,opt -hotcoldsplit -argpromotion -separate-const-offset-from-gep -loop-idiom -sroa -ipsccp -memcpyopt -loop-predication -slp-vectorizer -forceattrs -dce -coro-early -flattencfg -jump-threading -lower-matrix-intrinsics -simplifycfg -indvars -instnamer -functionattrs -nary-reassociate -strip -ipsccp -loop-instsimplify -bdce -jump-threading -speculative-execution -irce -ipsccp -infer-address-spaces -early-cse-memssa -memcpyopt -instsimplify -loop-instsimplify -instcombine -mergeicmps -libcalls-shrinkwrap -simple-loop-unswitch -coro-split -reassociate -die -gvn input.bc -o output.bc
benchmark://cbench-v1/qsort,1.105263157894737,8.306469440460205,opt -simple-loop-unswitch -reassociate -ee-instrument -partially-inline-libcalls -reg2mem -loop-unroll-and-jam -ipconstprop -strip-dead-prototypes -div-rem-pairs -gvn -loop-reroll -speculative-execution -forceattrs -partial-inliner -constprop -mem2reg -tailcallelim -loop-guard-widening -loop-versioning-licm -redundant-dbg-inst-elim -float2int -jump-threading -forceattrs -instcombine -gvn-hoist -mldst-motion -dce -slp-vectorizer -sink -bdce -dse -aggressive-instcombine -name-anon-globals -gvn -functionattrs -speculative-execution -hotcoldsplit -argpromotion -load-store-vectorizer -simplifycfg -gvn-hoist input.bc -o output.bc

As you may notice, all 18 benchmarks are quicksort.

To fix this, I tried adding env.reset() after this line (env.benchmark = benchmark), and the output now looks like this (all different benchmarks):

benchmark,reward,walltime,commandline
benchmark://cbench-v1/stringsearch,0.9625,6.213060855865479,opt -loop-instsimplify -deadargelim -coro-elide -strip-debug-declare -coro-elide -loop-load-elim -early-cse -lcssa -constprop -slp-vectorizer -lcssa -strip -coro-cleanup -redundant-dbg-inst-elim -dse -early-cse -barrier -post-inline-ee-instrument -guard-widening -loop-sink -partially-inline-libcalls -lower-widenable-condition -loop-versioning-licm -coro-elide -coro-early -simplifycfg -pgo-memop-opt -coro-elide -lower-expect -loop-interchange -speculative-execution -indvars -deadargelim -early-cse-memssa -forceattrs -simplifycfg -mldst-motion -flattencfg -loop-interchange -partial-inliner -flattencfg -loop-guard-widening -ipconstprop -inject-tli-mappings -insert-gcov-profiling -sroa -instcombine -strip -callsite-splitting -jump-threading input.bc -o output.bc
benchmark://cbench-v1/qsort,1.1130742049469964,6.085037708282471,opt -loop-data-prefetch -globalsplit -insert-gcov-profiling -lowerinvoke -deadargelim -simple-loop-unswitch -lower-expect -ipconstprop -lower-constant-intrinsics -coro-split -instsimplify -forceattrs -coro-early -partially-inline-libcalls -mem2reg -separate-const-offset-from-gep -sccp -reassociate -loop-vectorize -lowerswitch -coro-cleanup -pgo-memop-opt -mergefunc -lower-constant-intrinsics -separate-const-offset-from-gep -rewrite-statepoints-for-gc -slp-vectorizer -simplifycfg -newgvn input.bc -o output.bc
benchmark://cbench-v1/gsm,1.0239976151438366,8.641437292098999,opt -aggressive-instcombine -forceattrs -rewrite-statepoints-for-gc -instsimplify -early-cse-memssa -tailcallelim -adce -rpo-functionattrs -lower-constant-intrinsics -instcombine -early-cse-memssa -loop-versioning-licm -lowerswitch -speculative-execution -loop-fusion -loop-simplify -sancov -loop-reroll -sroa -hotcoldsplit -redundant-dbg-inst-elim -flattencfg -cross-dso-cfi -name-anon-globals -globalsplit -slp-vectorizer -sancov -strip-debug-declare -always-inline -inferattrs -constmerge -simplifycfg -loop-interchange -memcpyopt -canonicalize-aliases -rewrite-statepoints-for-gc -loop-idiom -licm -loop-fusion -memcpyopt -strip -early-cse-memssa -dce -sroa -strip-dead-prototypes -early-cse-memssa -bdce -loop-simplifycfg -redundant-dbg-inst-elim -redundant-dbg-inst-elim -reg2mem -ipconstprop -mldst-motion -libcalls-shrinkwrap -dse -simple-loop-unswitch -loop-versioning -indvars -globaldce -loop-predication -loop-distribute -redundant-dbg-inst-elim -loop-deletion -insert-gcov-profiling -pgo-memop-opt -jump-threading -ee-instrument -load-store-vectorizer -pgo-memop-opt -barrier -loop-load-elim -coro-early -mem2reg -insert-gcov-profiling -indvars -loop-predication -gvn-hoist -lower-guard-intrinsic -consthoist -separate-const-offset-from-gep -coro-cleanup -simplifycfg -slp-vectorizer -nary-reassociate input.bc -o output.bc
benchmark://cbench-v1/dijkstra,0.9731800766283524,6.306023359298706,opt -loop-simplifycfg -loop-simplify -loweratomic -strip -gvn -loop-load-elim -strip-dead-prototypes -correlated-propagation -slsr -always-inline -gvn-hoist -separate-const-offset-from-gep -sroa -strip-nondebug -flattencfg -loop-versioning-licm -lowerswitch -tailcallelim -dce -rewrite-statepoints-for-gc -guard-widening -early-cse-memssa -strip -instnamer -load-store-vectorizer -strip-dead-prototypes -dse -called-value-propagation -slp-vectorizer -ee-instrument -constprop -callsite-splitting -loop-guard-widening -loop-interchange -correlated-propagation -coro-split -lcssa -instcombine -loop-deletion -coro-early -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/susan,0.9561112082531912,8.255318641662598,opt -instnamer -early-cse -attributor -loop-rotate -prune-eh -loop-versioning -indvars -mldst-motion -speculative-execution -consthoist -scalarizer -functionattrs -loop-rotate -lower-guard-intrinsic -globalopt -jump-threading -loop-predication -post-inline-ee-instrument -elim-avail-extern -canonicalize-aliases -globaldce -lower-constant-intrinsics -mem2reg -adce -simple-loop-unswitch -loop-predication -sancov -loop-rotate -instcombine -called-value-propagation -adce -irce -dce -ee-instrument -loop-sink -loop-data-prefetch -loop-vectorize -lowerinvoke -scalarizer -instsimplify -early-cse-memssa -forceattrs -jump-threading -lower-guard-intrinsic -float2int -flattencfg -early-cse -loop-predication -loop-unswitch -ipsccp -loop-unroll -rewrite-statepoints-for-gc -guard-widening -instsimplify -indvars -slsr -loop-unswitch -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/tiffdither,0.5253378956372968,16.549588441848755,opt  input.bc -o output.bc
benchmark://cbench-v1/bzip2,0.7641730962764173,13.31516432762146,opt -early-cse -lower-widenable-condition -adce -loop-unroll-and-jam -instsimplify -globaldce -alignment-from-assumptions -post-inline-ee-instrument -ipconstprop -loop-unroll -licm -pgo-memop-opt -gvn -jump-threading -sccp -gvn-hoist input.bc -o output.bc
benchmark://cbench-v1/tiffmedian,0.5220599036970538,17.724722385406494,opt  input.bc -o output.bc
benchmark://cbench-v1/jpeg-c,0.4834913213347851,18.32247495651245,opt  input.bc -o output.bc
benchmark://cbench-v1/blowfish,0.979726305119108,6.818209171295166,opt -sancov -indvars -always-inline -callsite-splitting -flattencfg -early-cse-memssa -gvn-hoist -loop-distribute -newgvn -reassociate -forceattrs -nary-reassociate -lower-expect -coro-early -strip-debug-declare -name-anon-globals -div-rem-pairs -libcalls-shrinkwrap -inferattrs -attributor -simplifycfg -loop-distribute -scalarizer -loop-predication -gvn -bdce -coro-early -globalopt -lower-widenable-condition -sroa -coro-elide -slsr -loop-simplify -loop-simplifycfg -inject-tli-mappings -lower-matrix-intrinsics -deadargelim -sroa -coro-cleanup -loop-simplifycfg -mergereturn -speculative-execution -elim-avail-extern -partially-inline-libcalls -simplifycfg -coro-split -instcombine -licm -coro-elide -inline -instnamer -sancov -redundant-dbg-inst-elim -gvn -correlated-propagation -loop-unswitch -dce -sink -loop-predication -constmerge -scalarizer -inline -lower-matrix-intrinsics -tailcallelim -loop-distribute -loop-predication -indvars -lcssa -loop-vectorize -float2int -div-rem-pairs -loop-load-elim -globalopt input.bc -o output.bc
benchmark://cbench-v1/bitcount,0.9854014598540146,5.939620733261108,opt -loop-reroll -loop-sink -sancov -loop-unroll -coro-early -nary-reassociate -loop-predication -coro-split -callsite-splitting -newgvn -licm -simple-loop-unswitch -lcssa -lowerswitch -name-anon-globals -loop-sink -coro-split -strip-nondebug -elim-avail-extern -jump-threading -barrier -early-cse -irce -callsite-splitting -flattencfg -die -constmerge -reg2mem -ipsccp -slsr -sroa -float2int -loop-rotate -canonicalize-aliases -deadargelim -mergereturn -globalopt -lowerinvoke -mldst-motion -lower-widenable-condition -deadargelim -early-cse -loop-simplifycfg -ipconstprop -infer-address-spaces -prune-eh -lower-guard-intrinsic -instnamer -jump-threading -instcombine -canonicalize-aliases -globalsplit -coro-cleanup -strip -reassociate -lowerinvoke -bdce -loop-unroll -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/jpeg-d,0.4818491208167895,18.350997924804688,opt  input.bc -o output.bc
benchmark://cbench-v1/patricia,0.9527363184079602,6.237964868545532,opt -loop-fusion -nary-reassociate -bdce -sink -lowerinvoke -constmerge -reg2mem -globalsplit -lower-constant-intrinsics -loop-rotate -loop-versioning -name-anon-globals -sink -canonicalize-aliases -loop-simplify -functionattrs -loop-unroll -globalsplit -sroa -strip-debug-declare -loop-predication -newgvn -instnamer -loop-distribute -loop-guard-widening -lowerswitch -inferattrs -lower-guard-intrinsic -coro-elide -argpromotion -sccp -strip-dead-prototypes -globalopt -correlated-propagation -libcalls-shrinkwrap -bdce -nary-reassociate -memcpyopt -slp-vectorizer -loop-vectorize -loweratomic -gvn-hoist -simplifycfg -inferattrs -correlated-propagation input.bc -o output.bc
benchmark://cbench-v1/crc32,1.0,6.140795469284058,opt -jump-threading -strip-debug-declare -nary-reassociate -gvn -constprop -elim-avail-extern -early-cse-memssa -load-store-vectorizer -mem2reg -sroa -loop-reroll -rpo-functionattrs -simplifycfg -loop-versioning-licm -post-inline-ee-instrument -consthoist -strip-dead-prototypes -nary-reassociate -rewrite-statepoints-for-gc -simplifycfg -loop-idiom -simplifycfg -load-store-vectorizer -coro-early -instcombine input.bc -o output.bc
benchmark://cbench-v1/sha,1.36986301369863,6.241183280944824,opt -insert-gcov-profiling -break-crit-edges -post-inline-ee-instrument -loop-unroll -alignment-from-assumptions -dce -inferattrs -gvn -coro-elide -separate-const-offset-from-gep -load-store-vectorizer -gvn -simplifycfg -jump-threading -reassociate -lowerinvoke -licm -loop-fusion -strip-debug-declare -sink -barrier -consthoist -loop-guard-widening -deadargelim -strip-debug-declare -simple-loop-unswitch -constmerge -sroa -lower-constant-intrinsics -deadargelim -lower-widenable-condition -slsr -rewrite-statepoints-for-gc -argpromotion -instcombine input.bc -o output.bc
benchmark://cbench-v1/stringsearch2,0.981651376146789,6.310243368148804,opt -deadargelim -coro-early -break-crit-edges -partially-inline-libcalls -simplifycfg -loweratomic -loop-interchange -always-inline -coro-early -callsite-splitting -libcalls-shrinkwrap -speculative-execution -strip-dead-prototypes -early-cse-memssa -coro-early -lower-constant-intrinsics -elim-avail-extern -strip -guard-widening -bdce -slp-vectorizer -name-anon-globals -simplifycfg -alignment-from-assumptions -coro-elide -partially-inline-libcalls -strip -strip-nondebug -coro-elide -constprop -inferattrs -mem2reg -pgo-memop-opt -instcombine -div-rem-pairs -strip-nondebug -lower-widenable-condition -strip-nondebug -libcalls-shrinkwrap -aggressive-instcombine -separate-const-offset-from-gep -memcpyopt -forceattrs -loop-unroll -adce -consthoist -early-cse -coro-cleanup -simplifycfg input.bc -o output.bc
benchmark://cbench-v1/tiff2rgba,0.5252041390361569,18.630079746246334,opt  input.bc -o output.bc
benchmark://cbench-v1/tiff2bw,0.5251628484446486,18.686046361923218,opt  input.bc -o output.bc

Also I have one more question. After doing env.fork(), would env.reset() reset the whole state of the CompilationSession? I guess so.

Cheers, Chris

ChrisCummins commented 2 years ago

To fix this, I tried adding env.reset() after this line (env.benchmark = benchmark), and the output now looks like this (all different benchmarks):

Oh wow, please submit that as a patch! Looks like a bug to me 🙂

Also I have one more question. After doing env.fork(), would env.reset() reset the whole state of the CompilationSession? I guess so.

env.reset() ends the current CompilationSession and creates a new one. The forked one is unaffecte:

In [1]: import compiler_gym

In [2]: a = compiler_gym.make("llvm-v0")

In [3]: a.reset()

In [4]: a.step(a.action_space.sample())
Out[4]: (None, None, False, {'action_had_no_effect': True, 'new_action_space': False})

In [5]: b = a.fork()

In [6]: b.state
Out[6]: CompilerEnvState(benchmark='benchmark://cbench-v1/qsort', commandline='opt -aggressive-instcombine input.bc -o output.bc', walltime=15.320909976959229, reward=None)

In [7]: a.state
Out[7]: CompilerEnvState(benchmark='benchmark://cbench-v1/qsort', commandline='opt -aggressive-instcombine input.bc -o output.bc', walltime=17.208449840545654, reward=None)

In [8]: a.reset()

In [9]: a.state
Out[9]: CompilerEnvState(benchmark='benchmark://cbench-v1/qsort', commandline='opt  input.bc -o output.bc', walltime=1.0851802825927734, reward=None)

In [10]: b.state
Out[10]: CompilerEnvState(benchmark='benchmark://cbench-v1/qsort', commandline='opt -aggressive-instcombine input.bc -o output.bc', walltime=23.449546813964844, reward=None)

Cheers, Chris