facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
885 stars 123 forks source link

episode_reward does not work properly when env wrapped in SynchronousSqliteLogger #747

Open nluu175 opened 1 year ago

nluu175 commented 1 year ago

🐛 Bug

Here is the code I ran on Google Colab to create the environment.

env = gym.make("llvm-autophase-ic-v0")
env.reset()
env = SynchronousSqliteLogger(
    env=env,
    db_path=db_path,
)
env.reset(benchmark=benchmark_name)

and to produce the steps:

episode_reward = 0
i=0
while i<100:
    i+=1
    observation, reward, done, info = env.step(env.action_space.sample())
    if done:
        break
    episode_reward += reward
    print(f"Step {i}, quality={episode_reward:.3%}")

Following the above code snippets will always return episode_reward equals to 0 for every step. However, the environment works normally when removing SynchronousSqliteLogger wrapper.