Shengjiewang-Jason / EfficientZeroV2

[ICML 2024, Spotlight] EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
GNU General Public License v3.0
44 stars 7 forks source link

training on cartpole #5

Open nullonesix opened 3 months ago

nullonesix commented 3 months ago

trying to hook it up to cartpole..

print(self.env.reset())
(pid=2462395) [ 0.01369617 -0.02302133 -0.04590265 -0.04834723]

but it expects:

obs, info = self.env.reset()

ie there's no info component the full code where the error is thrown:

from ..base import BaseWrapper

class GymWrapper(BaseWrapper):
    """
    Make your own wrapper: Atari Wrapper
    """
    def __init__(self, env, obs_to_string=False):
        super().__init__(env, obs_to_string, False)

    def step(self, action):
        obs, reward, _, done, info = self.env.step(action)
        info['raw_reward'] = reward
        return obs, reward, done, info

    def reset(self,):
        print(self.env.reset())
        obs, info = self.env.reset()

        return obs

the error itself is:

ValueError: too many values to unpack (expected 2)
nullonesix commented 3 months ago

im just wondering how to resolve the issue in a way that's not hacky since it clearly reveals some misunderstanding i have about the process of interfacing ezv2 with a custom env

Shengjiewang-Jason commented 3 months ago

Thank you for the question. I test the cartpole-v1 in gym0.22.0. I met the same problem. But I see the following figure in OpenAI Gym github page. So I guess you can upgrade the gym if you wanna test gym benchmark.

截屏2024-08-16 10 48 54

nullonesix commented 3 months ago

then i get:

(ezv2) swarms@dpm10:~/EfficientZeroV2$ python ez/train.py exp_config=ez/config/exp/cartpole.yaml
Traceback (most recent call last):
  File "ez/train.py", line 25, in <module>
    from ez import agents
  File "/home/swarms/EfficientZeroV2/ez/agents/__init__.py", line 6, in <module>
    from ez.agents.ez_atari import EZAtariAgent
  File "/home/swarms/EfficientZeroV2/ez/agents/ez_atari.py", line 13, in <module>
    from ez.envs import make_atari
  File "/home/swarms/EfficientZeroV2/ez/envs/__init__.py", line 3, in <module>
    from gym.wrappers import Monitor
ImportError: cannot import name 'Monitor' from 'gym.wrappers' (/home/swarms/miniconda3/envs/ezv2/lib/python3.8/site-packages/gym/wrappers/__init__.py)
nullonesix commented 3 months ago

thank you for being so helpful

Shengjiewang-Jason commented 3 months ago

Oh, I see. The monitor class is removed in the newest gym repo. If you don't record the videos, you can remove the monitor wrapper firstly.

nullonesix commented 3 months ago

ok, so after upgrading gym to latest version, replacing monitor with recordvideo, and disabling seeding, i run training and i get:

(pid=4076730) (array([-0.00774786,  0.02113981,  0.00405227,  0.04962296], dtype=float32), {})
(pid=4076730) [-0.01504798 -0.04475251  0.02837704  0.02546087]
(pid=4076730) (array([-0.01801576, -0.04822684,  0.02919559, -0.00987596], dtype=float32), {})
(pid=4076699) (array([-0.01454708,  0.03767176, -0.04898228,  0.00636753], dtype=float32), {})
(pid=4076699) [ 0.03428657  0.01023199  0.01853956 -0.01512355]
(pid=4076699) (array([-0.00603121, -0.0295258 , -0.01322525,  0.01265372], dtype=float32), {})

where the print statement in question is:

from ..base import BaseWrapper

class GymWrapper(BaseWrapper):
    """
    Make your own wrapper: Atari Wrapper
    """
    def __init__(self, env, obs_to_string=False):
        super().__init__(env, obs_to_string, False)

    def step(self, action):
        obs, reward, _, done, info = self.env.step(action)
        info['raw_reward'] = reward
        return obs, reward, done, info

    def reset(self,):
        print(self.env.reset())
        obs, info = self.env.reset()

        return obs

so, sometimes it is a pair, and sometimes it is not ?

Shengjiewang-Jason commented 3 months ago

It is very weird. Did you notice the same problem using the example of gym like the following figure I sent?

nullonesix commented 3 months ago

It is impressive that I manage to get such weird behavior with so few changes 🤣 . Do you mean this image https://github.com/Shengjiewang-Jason/EfficientZeroV2/issues/5#issuecomment-2292634909 ? I didn't try it as I was unsure exactly where it went. I finally read the ezv2 paper and now am going through your code base so hopefully my understanding will improve.

Shengjiewang-Jason commented 3 months ago

Yeah, right. You can try it to test whether the basic env works. Ok, bro. Also you can look through the codebase. The problem may occur at some env wrappers in envs folder. You can pay more attention on the wrappers. If you still meet some problems, you can send them to me.