PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.24k stars 820 forks source link

[DDPG]PaddleCheckError 提示变量形状不匹配,具体报错如下 #332

Closed zbp-xxxp closed 4 years ago

zbp-xxxp commented 4 years ago

[07-08 00:22:31 MainThread @logger.py:224] Argv: D:/Envs/SmartCar/DDPG/train.py C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\importlib_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(*args, *kwds) C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\importlib_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(args, **kwds) C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))

[07-08 00:22:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now. [07-08 00:22:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now. [07-08 00:22:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now. Track generation: 1236..1549 -> 313-tiles track C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\envs\box2d\car_racing.py:407: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead arr = np.fromstring(image_data.get_data(), dtype=np.uint8, sep='') C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\executor.py:774: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "D:/Envs/SmartCar/DDPG/train.py", line 93, in run_episode(agent, env, rpm) File "D:/Envs/SmartCar/DDPG/train.py", line 43, in run_episode agent.learn(batch_obs, batch_action, batch_reward, batch_next_obs,batch_done) File "D:\Envs\SmartCar\DDPG\Agent.py", line 55, in learn self.learn_program, feed=feed, fetch_list=[self.critic_cost])[0] File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\executor.py", line 775, in run six.reraise(*sys.exc_info()) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\executor.py", line 770, in run use_program_cache=use_program_cache) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\executor.py", line 817, in _run_impl use_program_cache=use_program_cache) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\executor.py", line 894, in _run_program fetch_var_name) paddle.fluid.core_noavx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.


Python Call Stacks (More useful to users):

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\framework.py", line 2459, in append_op attrs=kwargs.get("attrs", None)) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\paddle\fluid\layers\nn.py", line 384, in fc "y_num_col_dims": 1}) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\parl\core\fluid\layers\layer_wrappers.py", line 161, in call act=act) File "D:\Envs\SmartCar\DDPG\Model.py", line 41, in value Q = self.fc2(concat) File "D:\Envs\SmartCar\DDPG\Model.py", line 13, in value return self.critic_model.value(obs, act) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\parl\algorithms\fluid\ddpg.py", line 84, in _critic_learn Q = self.model.value(obs, action) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\parl\algorithms\fluid\ddpg.py", line 65, in learn terminal) File "D:\Envs\SmartCar\DDPG\Agent.py", line 36, in build_program terminal) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\parl\core\fluid\agent.py", line 80, in init self.build_program() File "D:\Envs\SmartCar\DDPG\Agent.py", line 12, in init super(Agent, self).init(algorithm) File "D:/Envs/SmartCar/DDPG/train.py", line 87, in agent = Agent(algorithm, obs_dim, act_dim)


Error Message Summary:

PaddleCheckError: Expected x_mat_dims[1] == y_mat_dims[0], but received x_mat_dims[1]:31 != y_mat_dims[0]:33. ShapeError: After flatten the input tensor X and Y to 2-D dimensions matrix X1 and Y1, the matrix X1's width must be equal with matrix Y1's height. But received X's shape = [16, 31], X1's shape = [16, 31], X1's width = 31; Y's shape = [33, 1], Y1's shape = [33, 1], Y1's height = 33. at [D:\paddle-tiny-release-liujinquan\1.6.1-tiny\Paddle\paddle\fluid\operators\mul_op.cc:70] [operator < mul > error]

我的想法是用DDPG算法解决CarRacing-v0环境,但是报了上面的错误,不知道怎么去解决这个问题 改了一处地方又会有新的类似报错,求助!

TomorrowIsAnOtherDay commented 4 years ago

你好:)感谢提问 请贴一下你改动DDPG的代码部分,方便我们确认和指出问题。

zbp-xxxp commented 4 years ago

Model.py

import parl
from parl import layers

class Model(parl.Model):
    def __init__(self, act_dim):
        self.actor_model = ActorModel(act_dim)
        self.critic_model = CriticModel()

    def policy(self, obs):
        return self.actor_model.policy(obs)

    def value(self, obs, act):
        return self.critic_model.value(obs, act)

    def get_actor_params(self):
        return self.actor_model.parameters()

class ActorModel(parl.Model):
    def __init__(self, act_dim):
        hid_size = act_dim * 10
        self.fc1 = layers.fc(size=hid_size, act='tanh')
        self.fc2 = layers.fc(size=act_dim, act='softmax')

    def policy(self, obs):
        out = self.fc1(obs)
        out = self.fc2(out)
        return out

class CriticModel(parl.Model):
    def __init__(self):
        hid_size = 30
        self.fc1 = layers.fc(size=hid_size, act='tanh')
        self.fc2 = layers.fc(size=1, act=None)

    def value(self, obs, act):
        hid = self.fc1(obs)
        concat = layers.concat([hid, act], axis=1)
        Q = self.fc2(concat)
        Q = layers.squeeze(Q, axes=[1])
        return Q

train.py

import gym
import numpy as np
from replay_memory import ReplayMemory
from Model import Model
from Agent import Agent
from parl.algorithms import DDPG
from parl.utils import logger

ACTOR_LR = 1e-3  # Actor网络的 learning rate
CRITIC_LR = 1e-3  # Critic网络的 learning rate
GAMMA = 0.99      # reward 的衰减因子
TAU = 0.001       # 软更新的系数
MEMORY_SIZE = int(1e3)                  # 经验池大小
MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20  # 预存一部分经验之后再开始训练
BATCH_SIZE = 16
REWARD_SCALE = 0.1   # reward 缩放系数
NOISE = 0.05         # 动作噪声方差
TRAIN_EPISODE = 6000 # 训练的总episode数

def run_episode(agent, env, rpm):
    obs = env.reset()
    total_reward = 0
    steps = 0
    while True:
        steps += 1
        batch_obs = np.expand_dims(obs, axis=0)
        # print(batch_obs)
        action = agent.predict(batch_obs.astype('float32'))

        # 增加探索扰动, 输出限制在 [-1.0, 1.0] 范围内
        action = np.clip(np.random.normal(action, NOISE), -1.0, 1.0)

        next_obs, reward, done, info = env.step(action)

        action = [action]  # 方便存入replaymemory
        rpm.append((obs, action, REWARD_SCALE * reward, next_obs, done))

        if len(rpm) > MEMORY_WARMUP_SIZE and (steps % 5) == 0:
            (batch_obs, batch_action, batch_reward, batch_next_obs,batch_done) = rpm.sample(BATCH_SIZE)
            agent.learn(batch_obs, batch_action, batch_reward, batch_next_obs,batch_done)

        obs = next_obs
        total_reward += reward

        if done:
            break
    return total_reward

def evaluate(env, agent, render=False):
    eval_reward = []
    for i in range(5):
        obs = env.reset()
        total_reward = 0
        steps = 0
        while True:
            batch_obs = np.expand_dims(obs, axis=0)
            action = agent.predict(batch_obs.astype('float32'))
            action = np.clip(action, -1.0, 1.0)

            steps += 1
            next_obs, reward, done, info = env.step(action)

            obs = next_obs
            total_reward += reward

            if render:
                env.render()
            if done:
                break
        eval_reward.append(total_reward)
    return np.mean(eval_reward)

# 创建环境
env = gym.make('CarRacing-v0')

obs_dim = 96*96*3
act_dim = env.action_space.shape[0]  #转向、油门、刹车
# print(act_dim)

# 使用PARL框架创建agent
model = Model(act_dim)
algorithm = DDPG(model, gamma=GAMMA, tau=TAU, actor_lr=ACTOR_LR, critic_lr=CRITIC_LR)
agent = Agent(algorithm, obs_dim, act_dim)

# 创建经验池
rpm = ReplayMemory(MEMORY_SIZE)
# 往经验池中预存数据
while len(rpm) < MEMORY_WARMUP_SIZE:
    run_episode(agent, env, rpm)

episode = 0
while episode < TRAIN_EPISODE:
    for i in range(50):
        total_reward = run_episode(agent, env, rpm)
        episode += 1

    eval_reward = evaluate(env, agent, render=False)
    logger.info('episode:{}    test_reward:{}'.format(episode, eval_reward))

其他的都没变,麻烦了,谢谢!

rical730 commented 4 years ago

你好,问题可能是出现在train.py中run_episode()函数里这一行,课程代码中给到环境(连续版CartPole)的action是一个浮点数,比较特殊,但是为了适配replay_memory的存储格式,把action变成list存进去了,但是CarRacing-v0环境的action本身就是一个list,应该不需要这一行特殊处理。

        action = [action]  # 方便存入replaymemory

建议删掉上面这一行试试看

zbp-xxxp commented 4 years ago

现在可以运行了,非常感谢!