Unexpected behaviors of env.fork()

🐛 Bug

Expect the observations and rewards of the forked environment the same as those of the original environment when the same action is applied, but they could be different.

To Reproduce

Run the tests below to reproduce

import gym
import random
import numpy as np
import compiler_gym
import copy

def test1():
    # Expect the observations and rewards of the forked env the same as those of the original env,
    # but they could be different
    print("------test1------")
    benchmark = "benchmark://opencv-v0/108"  #"benchmark://cbench-v1/qsort"

    with gym.make("llvm-autophase-ic-v0", benchmark=benchmark) as env:
        env.reset()

        features = ["Programl", "IrInstructionCountOz", "IrInstructionCount", "IrSha1"]
        obs_space = [ env.observation.spaces[feature_name] for feature_name in features ]
        rewards_space = [
            env.reward.spaces["IrInstructionCountOz"],
            env.reward.spaces["IrInstructionCountO3"],
            env.reward.spaces["IrInstructionCount"],
        ]
        observations, rewards, done, info = env.step(action=[], observation_spaces=obs_space, reward_spaces=rewards_space)
        assert info['action_had_no_effect']
        i = 0
        while not done:
            forked_env = env.fork()
            action = env.action_space.sample()
            observations, rewards, done, info = env.step(action=action, observation_spaces=obs_space, reward_spaces=rewards_space)
            print(observations[1:3], rewards, action)
            obs_space2 = [ forked_env.observation.spaces[feature_name] for feature_name in features ]
            rewards_space2 = [
                forked_env.reward.spaces["IrInstructionCountOz"],
                forked_env.reward.spaces["IrInstructionCountO3"],
                forked_env.reward.spaces["IrInstructionCount"],
            ]
            observations2, rewards2, done2, info2 = forked_env.step(action=action, observation_spaces=obs_space2, reward_spaces=rewards_space2)
            print(observations2[1:3], rewards2, action)
            if tuple(observations[1:3] + rewards) != tuple(observations2[1:3] + rewards2):
                print("Error ==========")
                return
            i += 1
            if i > 10000:
                break

if __name__ == "__main__":
    test1()

Expected behavior

The forked environment should produce the same observations/rewards as the original environment when the same action is applied.

Environment

Please fill in this checklist:

CompilerGym: 0.2.4
How you installed CompilerGym (pip, source): following CompilerGymExperiments
OS: Ubuntu 20.04.2 LTS
Python version: 3.8.13
Build command you used (if compiling from source): following CompilerGymExperiments
GCC/clang version (if compiling from source):
Versions of any other relevant libraries:

You may use the environment collection script to generate most of this information. You can get the script and run it with:

wget https://raw.githubusercontent.com/facebookresearch/CompilerGym/stable/build_tools/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

facebookresearch / CompilerGym

Unexpected behaviors of env.fork() #749

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context