PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.25k stars 820 forks source link

关于V2.0.5版本下的ImportError #990

Closed ZhangzrJerry closed 1 year ago

ZhangzrJerry commented 1 year ago

完整错误信息:

  File "D:\项目\强化学习实验\车摆系统实验DQN\DQN.py", line 3, in <module>
    from parl import layers
ImportError: cannot import name 'layers' from 'parl' (D:\项目\强化学习实验\venv\lib\site-packages\parl\__init__.py)

本地环境:

pycharm - 2021.1.2
python - 3.9
paddle - 2.3.2
parl - 2.0.5

完整代码:

import parl
from parl import layers
import paddle.fluid as fluid
import numpy as np

class Agent(parl.Agent):
    def __init__(self,
                 algorithm,
                 obs_dim,
                 act_dim,
                 e_greed=0.1,
                 e_greed_decrement=0):
        assert isinstance(obs_dim, int)
        assert isinstance(act_dim, int)
        self.obs_dim = obs_dim
        self.act_dim = act_dim
        super(Agent, self).__init(algorithm)

        self.global_step = 0
        self.update_target_steps = 200  # 每隔200个training steps把model的参数复制到target_model

        self.e_greed = e_greed  # 有一定概率随机选择动作,探索
        self.e_greed_decrement = e_greed_decrement  # 随着训练逐步收敛,探索的程度慢慢降低

    def build_program(self):
        self.pred_program = fluid.Program()
        self.learn_program = fluid.Program()

        with fluid.program_guard(self.pred_program):  # 搭建计算图用于预测动作,定义输入输出变量
            obs = layers.data(
                name='obs', shape=[self.obs_dim], dtype='float32'
            )
            self.value = self.alg.predict(obs)

        with fluid.program_guard(self.learn_program):  # 搭建计算图用于更新Q网络,定义输入输出变量
            obs = layers.data(
                name='obs', shape=[self.obs_dim], dtype='float32'
            )
            action = layers.data(name='act', shape=[1], dtype='int32')
            reward = layers.data(name='reward', shape=[], dtype='float32')
            next_obs = layers.data(
                name='next_obs', shape=[self.obs_dim], dtype='float32'
            )
            terminal = layers.data(name='terminal', shap=[], dtype='bool')
            self.cost = self.alg.learn(obs, action, reward, next_obs, terminal)

    def sample(self, obs):
        sample = np.random.rand()
        if sample < self.e_greed:
            act = np.random.randint(self.act_dim)  # 探索:每个动作都有概率被选择
        else:
            act = self.predict(obs)  # 选择最优动作
        self.e_greed = max(
            0.01, self.e_greed - self.e_greed_decrement
        )  # 锁着训练逐步收敛,探索的程度慢慢降低
        return act

    def predict(self, obs):  # 选择最优动作
        obs = np.expand_dims(obs, axis=0)
        pred_Q = self.fluid_executor.run(
            self.pred_program,
            feed={'obs': obs.astype('float32')},
            fetch_list=[self.value]
        )[0]
        pred_Q = np.squeeze(pred_Q, axis=0)
        act = np.argmax(pred_Q)  # 选择Q最大的下标,即对应的动作
        return act

    def learn(self, obs, act, reward, next_obs, terminal):
        # 每隔200个training steps同步一次model和target_model的参数
        if self.global_step % self.update_target_steps == 0:
            self.alg.sync_target()
        self.global_step += 1

        act = np.expand_dims(act, -1)
        feed = {
            'obs': obs.astype('float32'),
            'act': act.astype('int32'),
            'reward': reward,
            'next_obs': next_obs.astype('float32'),
            'terminal': terminal
        }
        cost = self.fluid_executor.run(
            self.learn_program, feed=feed, fetch_list=[self.cost]
        )[0]  # 训练一次网络
        return cost
TomorrowIsAnOtherDay commented 1 year ago

你跑的这个代码是静态图版本的,目前paddle2.0+仅支持动态图。

TomorrowIsAnOtherDay commented 1 year ago

建议使用最新版本的DQN代码。

ZhangzrJerry commented 1 year ago

我舍弃了这个文件后,尝试直接引用PARL库的DQN算法,但也会报错

Traceback (most recent call last):
  File "D:\项目\强化学习实验\车摆系统实验DQN\Main.py", line 3, in <module>
    from parl.algorithm import DQN
ModuleNotFoundError: No module named 'parl.algorithm'
TomorrowIsAnOtherDay commented 1 year ago

https://github.com/PaddlePaddle/PARL/tree/develop/examples/DQN 你能测试下这份代码吗?

ZhangzrJerry commented 1 year ago

这是完整的运行结果

[11-22 20:30:59 MainThread @logger.py:242] Argv: D:/项目/强化学习实验/快速开始实验PARL/PARL/examples/DQN/train.py
[11-22 20:31:05 MainThread @utils.py:73] paddlepaddle version: 2.3.2.
D:\项目\强化学习实验\venv\lib\site-packages\gym\envs\registration.py:555: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.warn(
[11-22 20:31:07 MainThread @train.py:81] obs_dim 4, act_dim 2
Traceback (most recent call last):
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 132, in <module>
    main()
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 94, in main
    run_train_episode(agent, env, rpm)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 40, in run_train_episode
    action = agent.sample(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 55, in sample
    act = self.predict(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 68, in predict
    obs = paddle.to_tensor(obs, dtype='float32')
  File "D:\项目\强化学习实验\venv\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\tensor\creation.py", line 126, in to_tensor
    raise ValueError(
ValueError: 
    Faild to convert input data to a regular ndarray :
     - Usually this means the input data contains nested lists with different lengths. 

Process finished with exit code 1
TomorrowIsAnOtherDay commented 1 year ago

使用 python 进入python解释器,输入import paddle ,再输入 paddle.utils.run_check()。

如果出现 PaddlePaddle is installed successfully!,说明您已成功安装。


建议先检测下paddle动态图是否安装成功。

ZhangzrJerry commented 1 year ago
(venv) D:\项目\强化学习实验>python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
PaddlePaddle works well on 1 CPU.
W1122 20:38:47.157759 31544 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during tra
ining, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 2 CPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
ZhangzrJerry commented 1 year ago
>>> import paddle.fluid as fluid
>>> fluid.install_check.run_check()
Running Verify Fluid Program ...
Your Paddle Fluid works well on SINGLE GPU or CPU.
W1122 20:40:25.881815 31544 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during tra
ining, after fusion, the number of all_reduce ops is 1.
Your Paddle Fluid works well on MUTIPLE GPU or CPU.
Your Paddle Fluid is installed successfully! Let's start deep Learning with Paddle Fluid now
ZhangzrJerry commented 1 year ago

paddle和动态图都是安装成功的

TomorrowIsAnOtherDay commented 1 year ago

好的,在以下代码前输出obs 的shape看看。 https://github.com/PaddlePaddle/PARL/blob/e014495d99c34c52a6d8e997a14e65153eb93780/examples/DQN/cartpole_agent.py#L68

print(obs.shape)
ZhangzrJerry commented 1 year ago

噢这两个问题我遇到过,由于我的gym版本是0.26.2,在env.step()env.reset()两个位置api有变动,修改后官方的例程可以跑

TomorrowIsAnOtherDay commented 1 year ago

我们这边也已经注意到这个问题了,现在新增了wrapper解决这一问题。预计明天发新版本,之后会兼容新老版本的gym。感谢反馈。

ZhangzrJerry commented 1 year ago

还有一个是关于例程中的env.make()方法,新版本的gym加入了关键字参数render_mode='human'才能够正常使用以前的env.render()

rical730 commented 1 year ago

我舍弃了这个文件后,尝试直接引用PARL库的DQN算法,但也会报错

Traceback (most recent call last):
  File "D:\项目\强化学习实验\车摆系统实验DQN\Main.py", line 3, in <module>
    from parl.algorithm import DQN
ModuleNotFoundError: No module named 'parl.algorithm'

这个错误主要是少写了个字母 s导致的,应该是 from parl.algorithms import DQN,可以参考我们的DQN例程

https://github.com/PaddlePaddle/PARL/blob/develop/examples/DQN/train.py#L24

ZhangzrJerry commented 1 year ago

呃是的,是我的一个拼写错误 关于from parl import layers仍然会报错,我参考 https://parl.readthedocs.io/zh_CN/latest/tutorial/getting_started.html 把这部分代码完全替代为了paddle.nn.layer,重新进行了实验可以正常运行

rical730 commented 1 year ago

这是完整的运行结果

[11-22 20:30:59 MainThread @logger.py:242] Argv: D:/项目/强化学习实验/快速开始实验PARL/PARL/examples/DQN/train.py
[11-22 20:31:05 MainThread @utils.py:73] paddlepaddle version: 2.3.2.
D:\项目\强化学习实验\venv\lib\site-packages\gym\envs\registration.py:555: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.warn(
[11-22 20:31:07 MainThread @train.py:81] obs_dim 4, act_dim 2
Traceback (most recent call last):
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 132, in <module>
    main()
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 94, in main
    run_train_episode(agent, env, rpm)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 40, in run_train_episode
    action = agent.sample(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 55, in sample
    act = self.predict(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 68, in predict
    obs = paddle.to_tensor(obs, dtype='float32')
  File "D:\项目\强化学习实验\venv\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\tensor\creation.py", line 126, in to_tensor
    raise ValueError(
ValueError: 
  Faild to convert input data to a regular ndarray :
   - Usually this means the input data contains nested lists with different lengths. 

Process finished with exit code 1

你好,关于这个报错,能否尝试一下直接git clone PARL的源码,然后进入PARL的根目录下 pip install . 直接从源码安装parl,再尝试运行一下DQN代码

https://github.com/PaddlePaddle/PARL/tree/develop/examples/DQN

ZhangzrJerry commented 1 year ago

这是完整的运行结果

[11-22 20:30:59 MainThread @logger.py:242] Argv: D:/项目/强化学习实验/快速开始实验PARL/PARL/examples/DQN/train.py
[11-22 20:31:05 MainThread @utils.py:73] paddlepaddle version: 2.3.2.
D:\项目\强化学习实验\venv\lib\site-packages\gym\envs\registration.py:555: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1`.
  logger.warn(
[11-22 20:31:07 MainThread @train.py:81] obs_dim 4, act_dim 2
Traceback (most recent call last):
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 132, in <module>
    main()
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 94, in main
    run_train_episode(agent, env, rpm)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\train.py", line 40, in run_train_episode
    action = agent.sample(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 55, in sample
    act = self.predict(obs)
  File "D:\项目\强化学习实验\快速开始实验PARL\PARL\examples\DQN\cartpole_agent.py", line 68, in predict
    obs = paddle.to_tensor(obs, dtype='float32')
  File "D:\项目\强化学习实验\venv\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args, **kwargs)
  File "D:\项目\强化学习实验\venv\lib\site-packages\paddle\tensor\creation.py", line 126, in to_tensor
    raise ValueError(
ValueError: 
    Faild to convert input data to a regular ndarray :
     - Usually this means the input data contains nested lists with different lengths. 

Process finished with exit code 1

你好,关于这个报错,能否尝试一下直接git clone PARL的源码,然后进入PARL的根目录下 pip install . 直接从源码安装parl,再尝试运行一下DQN代码

https://github.com/PaddlePaddle/PARL/tree/develop/examples/DQN

关于这个报错,是https://github.com/PaddlePaddle/PARL/blob/e014495d99c34c52a6d8e997a14e65153eb93780/examples/DQN/train.py#L37 https://github.com/PaddlePaddle/PARL/blob/e014495d99c34c52a6d8e997a14e65153eb93780/examples/DQN/train.py#L63 这两处位置与0.26.2版本的gym包之间的api冲突,相似的问题会在这两处位置出现 https://github.com/PaddlePaddle/PARL/blob/e014495d99c34c52a6d8e997a14e65153eb93780/examples/DQN/train.py#L42 https://github.com/PaddlePaddle/PARL/blob/e014495d99c34c52a6d8e997a14e65153eb93780/examples/DQN/train.py#L67

ZhangzrJerry commented 1 year ago

还有一个是关于例程中的env.make()方法,新版本的gym加入了关键字参数render_mode='human'才能够正常使用以前的env.render()

另外这个问题也使得例程没有办法正常地完成render渲染

TomorrowIsAnOtherDay commented 1 year ago

收到,我们也验证下render的兼容性问题,今天发版一起解决。

ShuaibinLi commented 1 year ago
截屏2022-11-23 12 53 45

已在Windows机器测试,你遇到的问题是由于gym版本较新,reset/step返回值和之前不同。
目前解决方案:下载最新的parl,通过pip install .使用,已经验证过没有问题。我们会马上发布新版本进行适配。

ZhangzrJerry commented 1 year ago
截屏2022-11-23 12 53 45

已在Windows机器测试,你遇到的问题是由于gym版本较新,reset/step返回值和之前不同。 目前解决方案:下载最新的parl,通过pip install .使用,已经验证过没有问题。我们会马上发布新版本进行适配。

emmmmmmmmmm不是这个问题,主要是之前有个例程太古早了,有些parl的api被干掉了用不了了

ZhangzrJerry commented 1 year ago

收到,我们也验证下render的兼容性问题,今天发版一起解决。

好的,辛苦你们