Open lqwk opened 2 years ago
Merging #762 (a53276a) into development (1c40e5b) will decrease coverage by
33.64%
. The diff coverage is34.00%
.
@@ Coverage Diff @@
## development #762 +/- ##
================================================
- Coverage 89.28% 55.63% -33.65%
================================================
Files 130 131 +1
Lines 7912 7961 +49
================================================
- Hits 7064 4429 -2635
- Misses 848 3532 +2684
Impacted Files | Coverage Δ | |
---|---|---|
compiler_gym/spaces/runtime_series_reward.py | 28.57% <28.57%> (ø) |
|
compiler_gym/wrappers/llvm.py | 44.82% <33.33%> (-55.18%) |
:arrow_down: |
compiler_gym/spaces/__init__.py | 100.00% <100.00%> (ø) |
|
compiler_gym/wrappers/__init__.py | 100.00% <100.00%> (ø) |
|
compiler_gym/util/permutation.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
compiler_gym/leaderboard/__init__.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
compiler_gym/service/runtime/__init__.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
compiler_gym/service/runtime/benchmark_cache.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...mpiler_gym/service/runtime/compiler_gym_service.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
...ice/runtime/create_and_run_compiler_gym_service.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
... and 83 more |
Note: needs #761 to land first
Hi @lqwk, I'm very sorry for my delay in reviewing this. I've not forgotten about it, I just have a backlog of issues to fix on the CI so that I can run the tests against these changes.
Cheers, Chris
Hi @lqwk, okay, I finally pushed through the backlog of issues and have a newly stable v0.2.5 release. Sorry again that it me so long to getting around to this.
Are you still working on this? If so, could you please rebase this on top of the development
branch so that we can use the CI to verify that all tests pass.
Cheers, Chris
Introduce RuntimeSeriesReward
Introduce a new implementation of comparing program runtimes that computes the reward as the difference of the medians between the current set of runtimes and the previous set of runtimes only if the runtime series are significantly different (determined by the Kruskal–Wallis test).
Source: https://htor.inf.ethz.ch/publications/img/hoefler-scientific-benchmarking.pdf
Testing
I ran a series of tests comparing the new implementation with the existing implementation using the LLVM autotuner on the
cbench-v1
benchmark. The rewards are shown below:The new implementation is on par with the existing implementation, and even beats the existing implementation on 12/17 of the benchmarks.
I am proposing to merge this upstream and we can maybe work on other optimizations such as early stopping.