/chapter1/chapter1 - Githubissues

qiwang067 commented 3 years ago

https://datawhalechina.github.io/easy-rl/#/chapter1/chapter1

Description

qiwang067 commented 3 years ago

test

Data-Designer commented 3 years ago

整理的很好，但是那个PPT的图太大了，要是能缩小一点就好；

qiwang067 commented 3 years ago

整理的很好，但是那个PPT的图太大了，要是能缩小一点就好；

感谢您的反馈，后期的话，我们会对在线版本的图的大小进行调整，您可以先看 pdf 版本： https://github.com/datawhalechina/easy-rl/releases/tag/v1.0.0

qiwang067 commented 3 years ago

整理得太好了，感谢

感谢您的认可，能对您有所帮助就好

minoshiro1 commented 3 years ago

很棒，良心干货

qiwang067 commented 2 years ago

很棒，良心干货

感谢您的认可，能够对您有所帮助就好。

lancescrazy commented 2 years ago

请问第一段实验的报错是什么原因NameError: name 'load_agent' is not defined，谢谢解答了

qiwang067 commented 2 years ago

请问第一段实验的报错是什么原因NameError: name 'load_agent' is not defined，谢谢解答了

import gym 
env = gym.make("Taxi-v3") 
observation = env.reset() 
agent = load_agent() 
for step in range(100):
    action = agent(observation) 
    observation, reward, done, info = env.step(action)

您好，上面这段代码只是一段示例，其目的是让读者了解强化学习算法代码实现的一个框架，并非完整代码，load_agent 函数并未定义，所以会出现报错。

Strawberry47 commented 2 years ago

总结的非常棒！超级有帮助！！谢谢博主！！！ღ( ´･ᴗ･` )比心

qiwang067 commented 2 years ago

总结的非常棒！超级有帮助！！谢谢博主！！！ღ( ´･ᴗ･` )比心

能对您有所帮助就好 XD

pencilccc commented 2 years ago

对刚入门的小白很有帮助，感谢博主

HiQiang commented 2 years ago

"2012 年过后，我们有了卷积神经网络" (这一句话似乎不太严谨) reference:Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998, http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

qiwang067 commented 2 years ago

对刚入门的小白很有帮助，感谢博主

对您有所帮助就好~

qiwang067 commented 2 years ago

"2012 年过后，我们有了卷积神经网络" (这一句话似乎不太严谨) reference:Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998, http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

感谢您的反馈，"2012 年过后，我们有了卷积神经网络"（这句话确实不严谨），2012年，Krizhevsky等人[1]提出了AlexNet，AlexNet在ImageNet分类比赛中取得冠军，迅速引起了人们对于卷积神经网络的广泛关注。

已修改，原文已更新。

根据《深度学习》（花书），卷积神经网络最早可追溯到1989年的文献[2]。

Refs： [1] A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks[C]. Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, Nevada, USA, 2012, 1097-1105. [2] Y. LeCun, et al. Generalization and network design strategies[J]. Connectionism in Perspective, 1989, 19(1): 143-155.

cknnsshxs commented 2 years ago

整理总结的非常棒，对学习强化学习非常有帮助，感谢你们付出的时间和精力:heart::heart::heart:

qiwang067 commented 2 years ago

整理总结的非常棒，对学习强化学习非常有帮助，感谢你们付出的时间和精力❤️❤️❤️

感谢您的认可，对您有所帮助就好~

lancescrazy commented 2 years ago

想咨询一下，请问这个页面是用什么工具搭建的

qiwang067 commented 2 years ago

想咨询一下，请问这个页面是用什么工具搭建的

docsify

xx529 commented 2 years ago

感谢博主

huangshoucheng commented 2 years ago

很棒 nice

stellar749 commented 2 years ago

在描述状态和观测关系的时候，这一句是否会引起误解“在 agent 的内部也有一个函数来更新状态”，我的理解是，只有环境内部有一个状态更新函数，agent内部是策略函数。如果是不完全可观的话，说明agent观测到的状态里面有分量是不确定的，因此在POMDP问题中会用一个概率分布来表示状态

qiwang067 commented 2 years ago

在描述状态和观测关系的时候，这一句是否会引起误解“在 agent 的内部也有一个函数来更新状态”，我的理解是，只有环境内部有一个状态更新函数，agent内部是策略函数。如果是不完全可观的话，说明agent观测到的状态里面有分量是不确定的，因此在POMDP问题中会用一个概率分布来表示状态

感谢您的反馈，建议您再看一下相关的表述，agent 里面状态更新的函数并不是策略函数。

Money8888 commented 2 years ago

请问一下

position, velocity = observation

这个语句报参数数量不对等的错是我gym版本有问题吗

yshuise commented 2 years ago

《强化学习》这本书是讲发明创造公式的书。

yshuise commented 2 years ago

sutton

qiwang067 commented 2 years ago

请问一下
position, velocity = observation 
这个语句报参数数量不对等的错是我gym版本有问题吗

@Money8888 感谢您的反馈，我这边运行没问题，所使用的 Gym 版本为 0.18.0，完整的运行代码为：

import gym
import numpy as np

class BespokeAgent:
    def __init__(self, env):
        pass

    def decide(self, observation): # 决策
        position, velocity = observation
        lb = min(-0.09 * (position + 0.25) ** 2 + 0.03,
                0.3 * (position + 0.9) ** 4 - 0.008)
        ub = -0.07 * (position + 0.38) ** 2 + 0.07
        if lb < velocity < ub:
            action = 2
        else:
            action = 0
        return action # 返回动作

    def learn(self, *args): # 学习
        pass

def play_montecarlo(env, agent, render=False, train=False):
    episode_reward = 0. # 记录回合总奖励，初始化为0
    observation = env.reset() # 重置游戏环境，开始新回合
    while True: # 不断循环，直到回合结束
        if render: # 判断是否显示
            env.render() # 显示图形界面，图形界面可以用 env.close() 语句关闭
        action = agent.decide(observation)
        next_observation, reward, done, _ = env.step(action) # 执行动作
        episode_reward += reward # 收集回合奖励
        if train: # 判断是否训练智能体
            agent.learn(observation, action, reward, done) # 学习
        if done: # 回合结束，跳出循环
            break
        observation = next_observation
    return episode_reward # 返回回合总奖励

env = gym.make('MountainCar-v0')
agent = BespokeAgent(env)
print('观测空间 = {}'.format(env.observation_space))
print('动作空间 = {}'.format(env.action_space))
print('观测范围 = {} ~ {}'.format(env.observation_space.low,
        env.observation_space.high))
print('动作数 = {}'.format(env.action_space.n))
episode_rewards = [play_montecarlo(env, agent) for _ in range(100)]
print('平均回合奖励 = {}'.format(np.mean(episode_rewards)))

kbbbhhy commented 2 years ago

@Money8888 请问一下
position, velocity = observation 
这个语句报参数数量不对等的错是我gym版本有问题吗

我也出现了，不过我发现是自己的问题，就是gym.make('MountainCar-v0')中的版本打成了CartPole-v0，似乎是需要MountainCar才可以。

jzhangCSER01 commented 1 year ago

1.2.3 序列决策中一个公式应该为: Ht = o1,a1,r1,...ot,at,rt 吧

qiwang067 commented 1 year ago

1.2.3 序列决策中一个公式应该为: Ht = o1,a1,r1,...ot,at,rt 吧

@jzhangCSER01 感谢您的反馈:+1:，1.2.3 序列决策中一个公式确实应为

qiwang067 commented 1 year ago

1.2.3 序列决策中一个公式应该为: Ht = o1,a1,r1,...ot,at,rt 吧

@jzhangCSER01 感谢您的反馈👍，1.2.3 序列决策中一个公式确实应为

已修改相关错误

jzhangCSER01 commented 1 year ago

1.7.2 中 lb = min(-0.09 * (position + 0.25) * 2 + 0.03, 0.3 (position + 0.9) ** 4 - 0.008) 这里报错 ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。 position, velocity = observation 中得到的 position 类型是 ndarray，所以 min() 函数出错，要怎么解决

jzhangCSER01 commented 1 year ago

1.7.2 中 lb = min(-0.09 * (position + 0.25) * 2 + 0.03, 0.3 (position + 0.9) ** 4 - 0.008) 这里报错 ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。 position, velocity = observation 中得到的 position 类型是 ndarray，所以 min() 函数出错，要怎么解决

我运行了您下面提供的代码也是报同样的错误，我的 gym 版本是 0.26.2，python 版本是 3.10.1

13071558875 commented 1 year ago

也是卡在这个问题了，ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。下一步if lb < velocity < ub:也不行，这里velocity是个空dict

13071558875 commented 1 year ago

import gym
import numpy as np

class BespokeAgent:

    def __init__(self, env):
        pass

    def decide(self, observation):  # 决策
        position, velocity = observation
        lb = min(-0.09 * (position + 0.25)**2 + 0.03,
                 0.3 * (position + 0.9)**4 - 0.008)
        ub = -0.07 * (position + 0.38)**2 + 0.07
        if lb < velocity < ub:
            action = 2
        else:
            action = 0
        return action  # 返回动作

    def learn(self, *args):  # 学习
        pass

def play_montecarlo(env, agent, render=False, train=False):
    episode_reward = 0.  # 记录回合总奖励，初始化为0
    observation, info = env.reset()  # 重置游戏环境，开始新回合
    while True:  # 不断循环，直到回合结束
        # if render:  # 判断是否显示
        #     env.render()  # 显示图形界面，图形界面可以用 env.close() 语句关闭
        action = agent.decide(observation)
        state, reward, terminated, truncated, info = env.step(action)  # 执行动作
        episode_reward += reward  # 收集回合奖励
        if train:  # 判断是否训练智能体
            agent.learn(observation, action, reward, terminated,
                        truncated)  # 学习
        if terminated or truncated:  # 回合结束，跳出循环
            env.close()
            break
        observation = state
    return episode_reward  # 返回回合总奖励

env = gym.make('MountainCar-v0', render_mode='human')
agent = BespokeAgent(env)
print('观测空间 = {}'.format(env.observation_space))
print('动作空间 = {}'.format(env.action_space))
print('观测范围 = {} ~ {}'.format(env.observation_space.low,
                              env.observation_space.high))
print('动作数 = {}'.format(env.action_space.n))
episode_rewards = [play_montecarlo(env, agent) for _ in range(100)]
print('平均回合奖励 = {}'.format(np.mean(episode_rewards)))

最新版gym可以运行

13071558875 commented 1 year ago

lb = min(-0.09 * (position + 0.25)**2 + 0.03,0.3 * (position + 0.9)**4 - 0.008)
ub = -0.07 * (position + 0.38)**2 + 0.07

大佬能解释一下这一步吗

Gaben21 commented 1 year ago

您好，请问gymlibrary的网站为什么打不开呢？还有就是observation, reward, done, info = env.step(action)这行代码报错了报错信息是ValueError: too many values to unpack (expected 4) 请问是gym的版本更新导致的吗？我的版本是0.26

Gaben21 commented 1 year ago

@13071558875

lb = min(-0.09 * (position + 0.25)**2 + 0.03,0.3 * (position + 0.9)**4 - 0.008)
ub = -0.07 * (position + 0.38)**2 + 0.07

大佬能解释一下这一步吗

老哥，请问一下observation, reward, done, info = env.step(action)这一步会报错吗？我使用的是最新版的gym，还有就是官方文档可以打得开吗？

jzhangCSER01 commented 1 year ago

@13071558875

lb = min(-0.09 * (position + 0.25)**2 + 0.03,0.3 * (position + 0.9)**4 - 0.008)
ub = -0.07 * (position + 0.38)**2 + 0.07

大佬能解释一下这一步吗

应该是小车位置的上界和下界吧

jzhangCSER01 commented 1 year ago

@Gaben21
@13071558875
lb = min(-0.09 * (position + 0.25)**2 + 0.03,0.3 * (position + 0.9)**4 - 0.008)
ub = -0.07 * (position + 0.38)**2 + 0.07
大佬能解释一下这一步吗
老哥，请问一下observation, reward, done, info = env.step(action)这一步会报错吗？我使用的是最新版的gym，还有就是官方文档可以打得开吗？

会报错，你用一个列表变量接收就行，然后通过索引取值。 results = env.step(action) # 用于提交动作，括号内是具体的动作 print(results[0])

Gaben21 commented 1 year ago

希望作者可以更新一下代码，Gym版本更新之后和文中的代码有一定的区别，按照文中的代码已经无法实现了

13071558875 commented 1 year ago

@Gaben21 done已经弃用，改为了terminated, truncated 官方文档可以查看https://www.gymlibrary.dev/api/core/

13071558875 commented 1 year ago

@jzhangCSER01

Transition Dynamics:

Given an action, the mountain car follows the following transition dynamics:
*velocity<sub>t+1</sub> = velocity<sub>t</sub> + (action - 1) * force - cos(3 * position<sub>t</sub>) * gravity*
*position<sub>t+1</sub> = position<sub>t</sub> + velocity<sub>t+1</sub>*
where force = 0.001 and gravity = 0.0025. The collisions at either end are inelastic with the velocity set to 0
upon collision with the wall. The position is clipped to the range `[-1.2, 0.6]` and
velocity is clipped to the range `[-0.07, 0.07]`.

官方文档中对动力的定义，其实就是高度变化带来的动能，然后根据公式计算lb ub。我是自己理解的，书这部分没解释，而且挺多基础知识还是需要结合别的书会比较好，感觉有的地方翻译的怪怪的

jzhangCSER01 commented 1 year ago

@13071558875 @Gaben21 done已经弃用，改为了terminated, truncated 官方文档可以查看https://www.gymlibrary.dev/api/core/

observation, info = env.reset() # 重置游戏环境，开始新回合还有这一步，官方文档 gym.Env.reset() 返回值为 observation, info

qiwang067 commented 1 year ago

@jzhangCSER01 @Gaben21 @13071558875 感谢各位小伙伴的反馈 👍

由于 Gym 库 0.26.0 及其之后的版本对之前的代码不兼容，所以我们安装 0.26.0 之前的 Gym，比如 0.25.2。

pip install gym==0.25.2

另外，Gym 的官方文档的地址已更新。

BackMountainDevil commented 1 year ago

原文：“大家可以点这个链接看一看这些环境” from 第一章实验 gym

bug：链接 404,根据观测官网网页后缀，实际链接应该是 https://www.gymlibrary.dev/environments/classic_control/

qiwang067 commented 1 year ago

原文：“大家可以点这个链接看一看这些环境” from 第一章实验 gym

bug：链接 404,根据观测官网网页后缀，实际链接应该是 https://www.gymlibrary.dev/environments/classic_control/

@BackMountainDevil 感谢您的纠错:+1:，正确链接确实是 https://www.gymlibrary.dev/environments/classic_control/ ，文档已更新，错误已纠正

andyisokay commented 1 year ago

两个update：

似乎现在应该用 import gymnasium as gym
而且 step() 不再返回 done：

observation, reward, terminated, truncated, info = env.step(action)

Nos-Talion commented 1 year ago

请问想要跑代码的话一定要装spinningup吗？卡在tensflow环境好久了

qiwang067 commented 1 year ago

两个update：

似乎现在应该用 import gymnasium as gym

而且 step() 不再返回 done：

observation, reward, terminated, truncated, info = env.step(action)

感谢您的反馈，Gym 库 0.26.0 及其之后的版本对之前的代码不兼容，这一点已在文中注明：

qiwang067 commented 1 year ago

请问想要跑代码的话一定要装spinningup吗？卡在tensflow环境好久了

不需要装 spinningup

datawhalechina / easy-rl

/chapter1/chapter1 #34

Transition Dynamics: