danijar / dreamerv3

Mastering Diverse Domains through World Models
https://danijar.com/dreamerv3
MIT License
1.25k stars 216 forks source link

Dimension issue with Observation from custom Gym environment #113

Closed mikazlopes closed 5 months ago

mikazlopes commented 5 months ago

Hi,

I created a model to train an agent to play Diablo 2 LOD (original version). I have been running it in SB3 without issues. I came across DreamverV3 recently and want to try training the agent using this library for comparison.

My environment uses Gym and my action and observation spaces are the following:

self.action_space = spaces.Box(low=np.array([0, 0, 0, 0], dtype=np.float32), high=np.array([800, 600, 1, self.keyboard_action_space - 1], dtype=np.float32))
self.observation_space = spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8)

My step() and reset() observation collection code is:

response = requests.get(f"{self.server_url}/screenshotsmall")
image = Image.open(BytesIO(response.content))
observation = np.array(image)

When I use this observation collection method, I get the following error when the training initiates:

/Documents/d2drl/dreamerv3/embodied/envs/from_gym.py:97 in │ │ │ │ 94 │ except Exception as e: │ │ 95 │ │ print(f"Error with key {k}: {e}") │ │ 96 │ │ print(f"Type of {k}: {type(v)}, value: {v}") │ │ ❱ 97 │ obs = {k: np.asarray(v) for k, v in obs.items()} │ │ 98 │ obs.update( │ │ 99 │ │ reward=np.float32(reward), │ │ 100 │ │ is_first=is_first, │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

If my observation collection code changes to:


response = requests.get(f"{self.server_url}/screenshotsmall")
image = Image.open(BytesIO(response.content))
observation_image = np.array(image)

observation = {
    'image': observation_image,
}

I get the following error:

/Documents/d2drl/dreamerv3/embodied/core/basics.py:32 in convert │ │ │ │ 29 │ │ value = value.astype(dst) │ │ 30 │ │ break │ │ 31 │ else: │ │ ❱ 32 │ raise TypeError(f"Object '{value}' has unsupported dtype: {value.dtype}") │ │ 33 return value │ │ 34 │ │ 35 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: Object '[[{'image': array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)} {}]]' has unsupported dtype: object

I tried several different things, such as different gym versions, etc., but I always get this error. The issue seems to be how the observation space is being sent to DreamerV3.

My training code is pretty much the example.py changed to my env. I did not change anything in config.yaml.


def main():
    import warnings
    import dreamerv3
    from dreamerv3 import embodied
    # Corrected import:
    from d2d3Env import DiabloIIGymEnv  # Import your custom environment
    from dreamerv3.embodied.envs.from_gym import FromGym, CompatibleActionSpaceWrapper  # Import the necessary classes
    warnings.filterwarnings('ignore', '.*truncated to dtype int32.*')

    # Configure DreamerV3
    config = embodied.Config(dreamerv3.configs['defaults'])
    config = config.update(dreamerv3.configs['xlarge'])
    config = config.update({
        'logdir': '~/logdir/run1',
        'run.train_ratio': 64,
        'run.log_every': 30,  # Seconds
        'batch_size': 16,
        'jax.prealloc': False,
        'encoder.mlp_keys': '$^',
        'decoder.mlp_keys': '$^',
        'encoder.cnn_keys': 'image',
        'decoder.cnn_keys': 'image',
    })
    config = embodied.Flags(config).parse()

    # Setup logging and agents
    logdir = embodied.Path(config.logdir)
    step = embodied.Counter()
    logger = embodied.Logger(step, [
        embodied.logger.TerminalOutput(),
        embodied.logger.JSONLOutput(logdir, 'metrics.jsonl'),
        embodied.logger.TensorBoardOutput(logdir),
    ])

    envs = DiabloIIGymEnv(server_url=f'http://192.168.150.139:5009', flask_port=8129)

    envs = FromGym(envs, obs_key='image')

    # Wrap the batch of environments for DreamerV3
    env = dreamerv3.wrap_env(envs, config)
    env = embodied.BatchEnv([env], parallel=False)

    agent = dreamerv3.Agent(env.obs_space, env.act_space, step, config)
    replay = embodied.replay.Uniform(config.batch_length, config.replay_size, logdir / 'replay')
    args = embodied.Config(**config.run, logdir=config.logdir, batch_steps=config.batch_size * config.batch_length)
    embodied.run.train(agent, env, replay, logger, args)

if __name__ == '__main__':
    main()

Hopefully, someone has faced this in the past and can help me. I am really keen on trying this library to see how the agent progresses in a complex game such as Diablo 2 LOD.

mikazlopes commented 5 months ago

Answering my ticker. I was able to sort out the issue. I decided to maintain my original observation format, which is

__init__():
self.observation_space = spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8)

step():
response = requests.get(f"{self.server_url}/screenshotsmall")
image = Image.open(BytesIO(response.content))
observation = np.array(image)

Passing the observation directly as a numpy array was causing this error in the FromGym class

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1
dimensions. The detected shape was (2,) + inhomogeneous part.

So, I changed the _obs definition in FromGym to ensure that it got the observation in the correct format; otherwise change it to what was expected. The _obs definition now looks like this

def _obs(self, obs, reward, is_first=False, is_last=False, is_terminal=False):

    if isinstance(obs, tuple):
        obs = obs[0]  # Assuming the actual observation is the first element of the tuple
    if isinstance(obs, np.ndarray):
        updated_obs = {'image': np.asarray(obs, dtype=np.uint8)}
    elif isinstance(obs, dict):
        updated_obs = {k: np.asarray(v, dtype=np.uint8) for k, v in obs.items()}
    else:
        print("Unexpected observation format received:", type(obs))
        updated_obs = {}

    updated_obs.update({
        'reward': np.float32(reward),
        'is_first': is_first,
        'is_last': is_last,
        'is_terminal': is_terminal
    })

    return updated_obs

That seemed to do the trick. On another note, and it might be related to my changes when I run it as a single instance where Parallel=False, it works fine; if I use multiple instances, I had an error and had to perform another change, this time in the BatchEnv class. In the step method in the BatchEnv class, I had to comment on the code which calls the observation as a function if self.parallel is True:


def step(self, action):
        assert all(len(v) == len(self._envs) for v in action.values()), (
            len(self._envs), {k: v.shape for k, v in action.items()})
        obs = []
        for i, env in enumerate(self._envs):
            act = {k: v[i] for k, v in action.items()}
            obs.append(env.step(act))
        #  No need to call the observations as functions (Changed from original code)
        # if self._parallel:
        #    obs = [ob() for ob in obs]
        return {k: np.stack([ob[k] for ob in obs]) for k in obs[0]}

After I had performed these two changes, everything worked fine. Kudos to @danijar. The speed at which the agent evolves in Diablo 2 is about ten times faster than what I saw using Stable Baselines3. The agent performed certain actions by step 1M, which took the SB3 agent about 10M steps to figure out.

Another quick side note: if you are using an Apple M1 Mac, even after installing Jax and Jaxlib with Metal support, it won't run the training and will give you an error; for GPU usage, it needs to run on CUDA.

I hope this helps whoever might be facing the same issue. I am closing the issue.