Hi, I'm new to CompilerGym. I'm a little confused about the rewards described in the Leaderboards section in README.md, which states "Reward is the reduction in instruction count achieved scaled to the reduction achieved by LLVM's builtin -Oz pipeline." This indicates that
$$R{leaderboard} = (C{oz} - C{final}) / (C{original} - C{oz})$$,
where $C$ means the instruction count of IR. However, according to the reward space defined in the doc, the reward that we can obtain is defined as
$$R = (C{t-1} - C{t}) / (C{original} - C_{oz})$$.
How can one access $R_{leaderboard}$?
I found that many models in the leaderboard use the result of rewards returned by the last env.step(), which means $(C{final-1} - C{final}) / (C{original} - C{oz})$. Is this a correct way?
Hi, I'm new to CompilerGym. I'm a little confused about the rewards described in the Leaderboards section in README.md, which states "Reward is the reduction in instruction count achieved scaled to the reduction achieved by LLVM's builtin -Oz pipeline." This indicates that $$R{leaderboard} = (C{oz} - C{final}) / (C{original} - C{oz})$$, where $C$ means the instruction count of IR. However, according to the reward space defined in the doc, the reward that we can obtain is defined as $$R = (C{t-1} - C{t}) / (C{original} - C_{oz})$$.
How can one access $R_{leaderboard}$?
I found that many models in the leaderboard use the result of rewards returned by the last
env.step()
, which means $(C{final-1} - C{final}) / (C{original} - C{oz})$. Is this a correct way?