danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
898 stars 195 forks source link

Minimal evaluation/example using gym observation #46

Closed ipsec closed 2 years ago

ipsec commented 2 years ago

Hi,

How to create an agent, load the weights and then call a prediction function to receive the action?

I'm trying to recreate one but many details are missing. I stuck in this error:

python validate.py --model_path ~/logdir/trader
Loading config.
Loading config. Done
Resizing keys image to (64, 64).
Create agent (step: 481310).
Encoder CNN inputs: ['image']
Encoder MLP inputs: []
Decoder CNN outputs: ['image']
Decoder MLP outputs: []
Create agent. Done!
Loading checkpoint.
Load checkpoint with 85 tensors and 32342130 parameters.
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/ml/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 568, in assert_same_structure
    _pywrap_utils.AssertSameStructure(nest1, nest2, check_types,
ValueError: The two structures don't have the same nested structure.

First structure: type=tuple str=(<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=481310>, <tf.Variable 'Variable:0' shape=() dtype=int32, numpy=0>, <tf.Variable 'Variable:0' shape=() dtype=float64, numpy=1.0>)

Second structure: type=tuple str=(483399, 48242, array([[-0.03663844,  0.02114336, -0.01451669, ..., -0.00666128,
        -0.01674761,  0.07526544],
       [-0.04041671,  0.02768614, -0.01707186, ..., -0.00505101,

My agent code:

import gym
import logging
import random
from typing import Sequence
import numpy as np
import tensorflow as tf
from dreamerv2.api import defaults
from dreamerv2 import common
from dreamerv2.agent import Agent

from pathlib import Path
from agents import BaseAgent

logger = logging.getLogger('root')

class Dreamerv2Agent(BaseAgent):
    def __init__(self,
                 conf_file: Path,
                 env: str,
                 test_mode: bool,
                 prefix: str,
                 batch: int,
                 model_path: Path,
                 seed: bool):
        super().__init__(env, test_mode, prefix, batch, model_path, seed)

        if self.seed:
            random.seed(0)
            np.random.seed(0)
            tf.random.set_seed(0)

        logger.info("Loading config.")
        config = common.Config.load(
            str(model_path.absolute() / 'config.yaml')
            )
        logger.info("Loading config. Done")

        # config = defaults.parse_flags()

        env = gym.make(env)
        env = common.GymWrapper(env)
        env = common.ResizeImage(env)
        if hasattr(env.act_space['action'], 'n'):
            env = common.OneHotAction(env)
        else:
            env = common.NormalizeAction(env)
        env = common.TimeLimit(env, config.time_limit)

        replay = common.Replay(
            model_path.absolute() / 'train_episodes',
            **config.replay
            )
        step = common.Counter(replay.stats['total_steps'])
        logger.info(f'Create agent (step: {step.value}).')
        self.agent = Agent(config, env.obs_space, env.act_space, step)
        logger.info('Create agent. Done!')

        logger.info('Loading checkpoint.')
        if (model_path.absolute() / 'variables.pkl').exists():
            self.agent.load(model_path.absolute() / 'variables.pkl')
        logger.info('Loading checkpoint. Done!')

    def get_action(self, observation: Sequence):
        # Receive the Gym observation to get action
        output, _ = self.agent.policy(observation)
        return output.get('action')
danijar commented 2 years ago

Seems like you're just trying to load an incompatible checkpoint? That happens for example when you change the model size or try to change to an environment with different obs/act spaces as the agent was trained on.

ipsec commented 2 years ago

Hi Danijar,

I have double checked the obs/act spaces and I found one odd behavior: the action space was changed (by the dreamerv2) from Discrete(8) to Box(0., 1., (8,)). This is right?

I'm training with this code:

import gym
import dreamerv2.api as dv2

config = dv2.defaults.update({
    'logdir': '~/logdir/trader',
    'log_every': 300,
    'train_every': 10,
    'prefill': 1e3,
    'actor_ent': 3e-3,
    'loss_scales.kl': 1.0,
    'discount': 0.99,
    'eval_every': 300,
    'replay': {'capacity': 2e3, 'ongoing': False, 'minlen': 10, 'maxlen': 30, 'prioritize_ends': True},
    'dataset': {'batch': 10, 'length': 10},
}).parse_flags()

env = gym.make('gym_orderbook:Trader-v0')
dv2.train(env, config)

And trying to load the variables file using the code from my first question.

The config.yaml I'm loading is from the logdir/config.yaml this is right?

Maybe I'm missing call same wrapper?

Thanks in advanced.

ipsec commented 2 years ago

I found the problem.

  print('Create agent.')
  agnt = agent.Agent(config, env.obs_space, env.act_space, step)
  dataset = iter(replay.dataset(**config.dataset))
  train_agent = common.CarryOverState(agnt.train)
  train_agent(next(dataset))

These train_agent calls are required before load the variables file.

Thanks

roger-creus commented 1 year ago

Hey, could you also please show how do you run an entire evaluation episode, calling the get_action() function at each step? It is not clear to me what the input to that function should be. Also, I believe the state of the world model should be updated and passed all the time... Thank you!