facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
906 stars 127 forks source link

Logical Failure when combing TimeLimit Wrapper with IterateOverBenchmarks #648

Closed JD-ETH closed 2 years ago

JD-ETH commented 2 years ago

🐛 Bug

If an environment is first wrapper with TimeLimit before IterateOverBenchmarks, it will not return "done" as True.

To Reproduce

Steps to reproduce the behavior:

env = TimeLimit(env, step_limit) 
env = CycleOverBenchmarks(env, benchmarks) 
_, done, _, _ = env.reset()
while not done:
    _, done, _, _ = env.step(0) 

This will not finish. However, if the TimeLimit happens after the Cycle, it has normal behavior.

Additional context

Assign it to me, I will fix it when I got time.

ChrisCummins commented 2 years ago

Good catch @JD-ETH and thanks for volunteering to patch this! 🙂

Cheers, Chris

nluu175 commented 2 years ago

Hi. I tried to reproduce the behavior but it did not work. I found that there are something wrong in the code snippets that @JD-ETH provided.

For env.step(0), done should be the third variable (currently second one in the provided code). Also, done is automatically assigned True as the beginning so the loop will not execute.

The way I reproduced the code is as follows:

env.reset()
i = 0
done = False
while not done:
    _, _, done, _ = env.step(0)

This code snippet should reproduce the exact behavior for this issue.