facebookresearch / CompilerGym

Reinforcement learning environments for compiler and program optimization tasks
https://compilergym.ai/
MIT License
906 stars 127 forks source link

Unexpected behaviors of env.fork() #749

Open youweiliang opened 2 years ago

youweiliang commented 2 years ago

🐛 Bug

Expect the observations and rewards of the forked environment the same as those of the original environment when the same action is applied, but they could be different.

To Reproduce

Run the tests below to reproduce

import gym
import random
import numpy as np
import compiler_gym
import copy

def test1():
    # Expect the observations and rewards of the forked env the same as those of the original env,
    # but they could be different
    print("------test1------")
    benchmark = "benchmark://opencv-v0/108"  #"benchmark://cbench-v1/qsort"

    with gym.make("llvm-autophase-ic-v0", benchmark=benchmark) as env:
        env.reset()

        features = ["Programl", "IrInstructionCountOz", "IrInstructionCount", "IrSha1"]
        obs_space = [ env.observation.spaces[feature_name] for feature_name in features ]
        rewards_space = [
            env.reward.spaces["IrInstructionCountOz"],
            env.reward.spaces["IrInstructionCountO3"],
            env.reward.spaces["IrInstructionCount"],
        ]
        observations, rewards, done, info = env.step(action=[], observation_spaces=obs_space, reward_spaces=rewards_space)
        assert info['action_had_no_effect']
        i = 0
        while not done:
            forked_env = env.fork()
            action = env.action_space.sample()
            observations, rewards, done, info = env.step(action=action, observation_spaces=obs_space, reward_spaces=rewards_space)
            print(observations[1:3], rewards, action)
            obs_space2 = [ forked_env.observation.spaces[feature_name] for feature_name in features ]
            rewards_space2 = [
                forked_env.reward.spaces["IrInstructionCountOz"],
                forked_env.reward.spaces["IrInstructionCountO3"],
                forked_env.reward.spaces["IrInstructionCount"],
            ]
            observations2, rewards2, done2, info2 = forked_env.step(action=action, observation_spaces=obs_space2, reward_spaces=rewards_space2)
            print(observations2[1:3], rewards2, action)
            if tuple(observations[1:3] + rewards) != tuple(observations2[1:3] + rewards2):
                print("Error ==========")
                return
            i += 1
            if i > 10000:
                break

if __name__ == "__main__":
    test1()

Expected behavior

The forked environment should produce the same observations/rewards as the original environment when the same action is applied.

Environment

Please fill in this checklist:

You may use the environment collection script to generate most of this information. You can get the script and run it with:

wget https://raw.githubusercontent.com/facebookresearch/CompilerGym/stable/build_tools/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

Additional context

youweiliang commented 2 years ago

Created PR to reproduce the issues. https://github.com/facebookresearch/CompilerGym/pull/751#issue-1336588351