Open beifeng1937 opened 1 year ago
There is also a bug in train function of Sarsa.ipynb . action = agent.sample(state) after while True should be deleted. Correct code is:
def train(cfg,env,agent):
print('开始训练!')
print(f'环境:{cfg.env_name}, 算法:{cfg.algo_name}, 设备:{cfg.device}')
rewards = [] # 记录奖励
for i_ep in range(cfg.train_eps):
ep_reward = 0 # 记录每个回合的奖励
state = env.reset() # 重置环境,即开始新的回合
action = agent.sample(state)
while True:
#action = agent.sample(state) should be deleted
next_state, reward, done, _ = env.step(action) # 与环境进行一次动作交互
next_action = agent.sample(next_state)
agent.update(state, action, reward, next_state, next_action,done) # 算法更新
state = next_state # 更新状态
action = next_action
ep_reward += reward
if done:
break
rewards.append(ep_reward)
print(f"回合:{i_ep+1}/{cfg.train_eps},奖励:{ep_reward:.1f},Epsilon:{agent.epsilon}")
print('完成训练!')
return {"rewards":rewards}
train function in MonteCarlo.ipynb
agent.update(one_ep_transition) # 更新智能体 should be outside the for-loop