Farama-Foundation / D4RL

A collection of reference environments for offline reinforcement learning
Apache License 2.0
1.35k stars 285 forks source link

[Bug Report] copy.deepcopy(env) causes unexpected behavior #208

Open acforvs opened 1 year ago

acforvs commented 1 year ago

Describe the bug

The behavior of copy.deepcopy(env) is currently undefined. I would expect either an error to be raised in case the environment should not be copied or for the environments to be compatible with deepcopy

Code example For example, when using deepcopy on "antmaze-umaze-v2", the reward_type changes from sparse to dense, and reward becomes negative, which I believe shouldn't be the case (see https://github.com/Farama-Foundation/D4RL/blob/master/d4rl/pointmaze/maze_model.py#L196). However, the .step function works, making it harder to notice the mistake.

import copy

import gym
import d4rl

env = gym.make("antmaze-umaze-v2")
new_env = copy.deepcopy(env)

env.seed(0)
new_env.seed(0)

state = env.reset()
action = env.action_space.sample()
_, reward, _, _ = env.step(action)
print(f"State: {state}") 
print(f"Action: {action}")
print(f"Reward: {reward}")

new_state = new_env.reset()
new_action = new_env.action_space.sample()
_, new_reward, _, _ = new_env.step(new_action)
print(f"New State: {new_state}")
print(f"New Action: {new_action}")
print(f"New Reward: {new_reward}")

print(f"Reward Types: {env.reward_type} vs. {new_env.reward_type}")

Output:

State: [ 0.05968136  0.04524872  0.73911593  0.99811582  0.00298145 -0.0596976
  0.01386086  0.01285718 -0.00742792 -0.01581104 -0.08846538  0.06740718
  0.09790658 -0.06988595 -0.00338617 -0.05037741 -0.0625188   0.05739599
 -0.07765951 -0.07817309  0.05622622  0.12245795  0.00909633  0.08016977
 -0.00340852 -0.11708338  0.03623063 -0.06030991 -0.03213472]
Action: [-0.8769915   0.09569477  0.46591443 -0.03204463 -0.23699978 -0.6913936
  0.5964345  -0.18665828]
Reward: 0.0
New State: [-0.06903731  0.04845185  0.78921383  0.99227966  0.08385488 -0.07643463
  0.05007176 -0.07194152 -0.05970388  0.08519238  0.03174577 -0.09178095
  0.08723601 -0.03991108 -0.02424408  0.20672978  0.10545167  0.06639785
  0.05823002  0.05074512  0.07413391 -0.03038314  0.07790665 -0.12351816
  0.09879504 -0.00317077  0.23729763  0.06375448 -0.05357395]
New Action: [ 0.15439004 -0.4896254  -0.12454828 -0.4938082   0.6774787   0.87535506
 -0.12077015  0.03304163]
New Reward: -8.97261873966396
Reward Types: sparse vs. dense

System Info I am using google colab

Additional context Something similar was also discussed here: https://github.com/openai/gym/issues/1863

Checklist