PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.25k stars 820 forks source link

WARNING: OMP_NUM_THREADS set to 2, not 1. #319

Closed DrRyanHuang closed 4 years ago

DrRyanHuang commented 4 years ago
[06-28 11:42:33 MainThread @logger.py:224] Argv: D:/git/parl/xx.py
WARNING: OMP_NUM_THREADS set to 2, not 1. 
The computation speed will not be optimized if you use data parallel. 
It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

只执行了这一句:

from parl.utils import logger

要使用多线程,我是要用别的工具编译paddlepaddle吗? 或者如何使用OMP_NUM_THREADS?

TomorrowIsAnOtherDay commented 4 years ago

感谢对于PARL的关注! 我们在这个地方设置了OMP_NUM_THREADS,当时设定的出发点是: paddle目前不支持在多线程并行的时候传入学习率(非batch形式)。 https://github.com/PaddlePaddle/PARL/blob/5054efedd11b93fb34d289ec54a20f01e3156ea6/parl/core/fluid/plutils/compiler.py#L36


如果需要开放多线程,可以参考如下的设置:

from parl.utils import logger
import os 
os.environ['CPU_NUM'] = '3'  # 增加这句话
TomorrowIsAnOtherDay commented 4 years ago

但是按道理你没调用parl.compile的话,是不会设置这个环境变量的,也就是说import logger不会触发。 可以贴一下你的完整代码吗?

DrRyanHuang commented 4 years ago

@TomorrowIsAnOtherDay

您的效率真高!!!

我的环境是win10,使用的是spyder 事实上,我只调用了这一句:from parl.utils import logger 便出现了这个Warning,完整代码如下:

import parl
from parl import layers
import paddle.fluid as fluid
import copy
import numpy as np
import os
import gym
from parl.utils import logger

LEARN_FREQ = 5                   # 训练频率,不需要每一个step都learn,攒一些新增经验后再learn,提高效率
MEMORY_SIZE = 20000              # replay memory的大小,越大越占用内存
MEMORY_WARMUP_SIZE = 200         # replay_memory 里需要预存一些经验数据,再从里面sample一个batch的经验让agent去learn
BATCH_SIZE = 32                  # 每次给agent learn的数据数量,从replay memory随机里sample一批数据出来
GAMMA = 0.99                     # reward 的衰减因子,一般取 0.9 到 0.999 不等
LEARNING_RATE = 0.0001 # 学习率

# -------------------------------------------------------
# 搭建Model、Algorithm、Agent架构
# -------------------------------------------------------
class Model(parl.Model):
    def __init__(self, act_dim):

        hid1_size = 128
        hid2_size = 128
        # 三层全连接层
        self.fc1 = layers.fc(size=hid1_size, act='relu')
        self.fc2 = layers.fc(size=hid2_size, act='relu')
        self.fc3 = layers.fc(size=act_dim, act=None)

    def value(self, obs):
        # 定义网络
        # 输入state,输出所有action对应的Q,[Q(s,a1), Q(s,a2), Q(s,a3)...]

        h1 = self.fc1(obs)
        h2 = self.fc2(h1)
        Q = self.fc3(h2)

        return Q

from parl.algorithms import DQN 

class Agent(parl.Agent):

    def __init__(self,
                 algorithm,
                 obs_dim,
                 act_dim,
                 e_greed=0.1,
                 e_greed_decrement=0):

        assert isinstance(obs_dim, int)
        assert isinstance(act_dim, int)
        self.obs_dim = obs_dim
        self.act_dim = act_dim
        super(Agent, self).__init__(algorithm)

        self.global_step = 0
        self.update_target_steps = 200              # 每隔200个training steps再把model的参数复制到target_model中

        self.e_greed = e_greed                      # 有一定概率随机选取动作,探索
        self.e_greed_decrement = e_greed_decrement  # 随着训练逐步收敛,探索的程度慢慢降低(探索度衰减率??)

    def build_program(self):

        self.pred_program = fluid.Program()
        self.learn_program = fluid.Program()

        with fluid.program_guard(self.pred_program):   # 搭建计算图用于 预测动作,定义输入输出变量
            obs = layers.data(
                name='obs', 
                shape=[self.obs_dim], 
                dtype='float32')

            self.value = self.alg.predict(obs)

        with fluid.program_guard(self.learn_program):  # 搭建计算图用于 更新Q网络,定义输入输出变量
            obs = layers.data(
                name='obs', 
                shape=[self.obs_dim], 
                dtype='float32')

            action = layers.data(name='act', shape=[1], dtype='int32')
            reward = layers.data(name='reward', shape=[], dtype='float32')
            next_obs = layers.data(
                name='next_obs', shape=[self.obs_dim], dtype='float32')
            terminal = layers.data(name='terminal', shape=[], dtype='bool')
            self.cost = self.alg.learn(obs, action, reward, next_obs, terminal)

    def sample(self, obs):
        sample = np.random.rand()  # 产生0~1之间的小数
        if sample < self.e_greed:
            act = np.random.randint(self.act_dim)  # 探索:每个动作都有概率被选择
        else:
            act = self.predict(obs)  # 选择最优动作
        self.e_greed = max(
            0.01, self.e_greed - self.e_greed_decrement)  # 随着训练逐步收敛,探索的程度慢慢降低
        return act

    def predict(self, obs):  # 选择最优动作

        obs = np.expand_dims(obs, axis=0)
        pred_Q = self.fluid_executor.run(
            self.pred_program,
            feed={'obs': obs.astype('float32')},
            fetch_list=[self.value])[0]
        pred_Q = np.squeeze(pred_Q, axis=0) # Remove single-dimensional entries from the shape of an array.
        act = np.argmax(pred_Q)             # 选择Q最大的下标,即对应的动作
        return act

    def learn(self, obs, act, reward, next_obs, terminal):

        # 每隔200个training steps同步一次model和target_model的参数
        if self.global_step % self.update_target_steps == 0:
            self.alg.sync_target()
        self.global_step += 1

        act = np.expand_dims(act, -1)
        feed = {
            'obs': obs.astype('float32'),
            'act': act.astype('int32'),
            'reward': reward,
            'next_obs': next_obs.astype('float32'),
            'terminal': terminal
        }
        cost = self.fluid_executor.run(
            self.learn_program, feed=feed, fetch_list=[self.cost])[0]  # 训练一次网络
        return cost

# replay_memory.py
import random
import collections
import numpy as np

# -------------------------------------------------------
# ReplayMemory
# -------------------------------------------------------
class ReplayMemory(object):
    def __init__(self, max_size):
        self.buffer = collections.deque(maxlen=max_size)

    # 增加一条经验到经验池中
    def append(self, exp):
        self.buffer.append(exp)

    # 从经验池中选取N条经验出来
    def sample(self, batch_size):
        mini_batch = random.sample(self.buffer, batch_size)
        obs_batch, action_batch, reward_batch, next_obs_batch, done_batch = [], [], [], [], []

        for experience in mini_batch:
            s, a, r, s_p, done = experience
            obs_batch.append(s)
            action_batch.append(a)
            reward_batch.append(r)
            next_obs_batch.append(s_p)
            done_batch.append(done)

        return np.array(obs_batch).astype('float32'), \
            np.array(action_batch).astype('float32'), np.array(reward_batch).astype('float32'),\
            np.array(next_obs_batch).astype('float32'), np.array(done_batch).astype('float32')

    def __len__(self):
        return len(self.buffer)

# ----------------------------------------------
# Training && Test(训练&&测试)
# ----------------------------------------------
# 训练一个episode
def run_episode(env, agent, rpm): # rpm: 经验池

    total_reward = 0
    obs = env.reset()
    step = 0
    while True:
        step += 1
        action = agent.sample(obs)  # 采样动作,所有动作都有概率被尝试到
        next_obs, reward, done, _ = env.step(action)
        rpm.append((obs, action, reward, next_obs, done))

        # train model
        if (len(rpm) > MEMORY_WARMUP_SIZE) and (step % LEARN_FREQ == 0):
            (batch_obs, batch_action, batch_reward, batch_next_obs,
             batch_done) = rpm.sample(BATCH_SIZE)
            train_loss = agent.learn(batch_obs, batch_action, batch_reward,
                                     batch_next_obs,
                                     batch_done)  # s,a,r,s',done

        total_reward += reward
        obs = next_obs
        if done:
            break
    return total_reward

# 评估 agent, 跑 5 个episode,总reward求平均
def evaluate(env, agent, render=False):
    eval_reward = []
    for i in range(5):
        obs = env.reset()
        episode_reward = 0
        while True:
            action = agent.predict(obs)  # 预测动作,只选最优动作
            obs, reward, done, _ = env.step(action)
            episode_reward += reward
            if render:
                env.render()
            if done:
                break
        eval_reward.append(episode_reward)
    return np.mean(eval_reward)

# ----------------------------------------------------------------------
# 创建环境和Agent,创建经验池,启动训练,保存模型
# ----------------------------------------------------------------------
env = gym.make('MountainCar-v0')         # 创建环境
action_dim = env.action_space.n          # MountainCar-v0: 3
obs_shape = env.observation_space.shape  # MountainCar-v0: (2,)

# 创建经验池
rpm = ReplayMemory(MEMORY_SIZE)  # DQN的经验回放池

# 根据parl框架构建agent
model = Model(action_dim)
algorithm = DQN(model, act_dim=action_dim, gamma=GAMMA, lr=LEARNING_RATE)
agent = Agent(
    algorithm,
    obs_dim=obs_shape[0],
    act_dim=action_dim,
    e_greed=0.01,
    e_greed_decrement=1e-6
)

# 加载模型
save_path = './dqn_model.ckpt'
agent.restore(save_path)

# 先往经验池里存一些数据,避免最开始训练的时候样本丰富度不够
while len(rpm) < MEMORY_WARMUP_SIZE:
    run_episode(env, agent, rpm)

max_episode = 2000

# 开始训练
episode = 0
while episode < max_episode:  # 训练max_episode个回合,test部分不计算入episode数量
    # train part
    for i in range(0, 50):
        total_reward = run_episode(env, agent, rpm)
        episode += 1

    # test part
    eval_reward = evaluate(env, agent, render=False)  # render=True 查看显示效果
    logger.info('episode:{}    e_greed:{}   test_reward:{}'.format(
        episode, agent.e_greed, eval_reward))

# 训练结束,保存模型
save_path = './dqn_model.ckpt'
agent.save(save_path)
TomorrowIsAnOtherDay commented 4 years ago

我们看了你的完整代码,这个输出应该不是PARL输出的

WARNING: OMP_NUM_THREADS set to 2, not 1. 
The computation speed will not be optimized if you use data parallel. 
It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

请尝试下直接import paddle 看下是否会有同样的提示。

DrRyanHuang commented 4 years ago

@TomorrowIsAnOtherDay

试了一下: import paddle没有任何输出

import parl有那个warning

Python 
Type "copyright", "credits" or "license" for more information.

IPython  -- An enhanced Interactive Python.

>>> import parl

WARNING: OMP_NUM_THREADS set to 2, not 1. 
The computation speed will not be optimized if you use data parallel. 
It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.
TomorrowIsAnOtherDay commented 4 years ago

好的,感谢反馈。


先普及一下OMP_NUM_THREADS这个环境变量,它是和openMP相关的并行计算环境变量,openmp是多线程并发的常用库。


我们确认了下,这个提示应该是PARL的某个第三方库在Windows机器上修改了OMP_NUM_THREADS这个环境变量。 然后paddle检测到这个环境变量为2,就提示说由于paddle是通过openblas的加速机制,不受这个变量影响。


太长不看版: 可以直接忽略,openblas是独立的矩阵运算加速库,这个提示没有任何影响。

DrRyanHuang commented 4 years ago

十分感谢,为你们的工作效率点赞!

TomorrowIsAnOtherDay commented 4 years ago

祝好:)