facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
880 stars 123 forks source link

How can one access the reward defined in the Leaderboards? #810

Open ChijinZ opened 11 months ago

ChijinZ commented 11 months ago

Hi, I'm new to CompilerGym. I'm a little confused about the rewards described in the Leaderboards section in README.md, which states "Reward is the reduction in instruction count achieved scaled to the reduction achieved by LLVM's builtin -Oz pipeline." This indicates that $$R{leaderboard} = (C{oz} - C{final}) / (C{original} - C{oz})$$, where $C$ means the instruction count of IR. However, according to the reward space defined in the doc, the reward that we can obtain is defined as $$R = (C{t-1} - C{t}) / (C{original} - C_{oz})$$.

How can one access $R_{leaderboard}$?

I found that many models in the leaderboard use the result of rewards returned by the last env.step(), which means $(C{final-1} - C{final}) / (C{original} - C{oz})$. Is this a correct way?