-
Deliverable: plot.py similar to what we had in the assignments that takes a log directory generated by PPO and spits out a learning rate curve.
This is going to involve transition from our in-class…
-
According to the DQN nature paper and [PPO1 implementation](https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo1/cnn_policy.py#L30), [this line](https://git…
-
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[25], line 3
1 from elegantrl.tr…
-
### What happened + What you expected to happen
After training Multi Agent PPO with new New API Stack under the guidance of [ how-to-use-the-new-api-stack](https://docs.ray.io/en/latest/rllib/rllib-n…
-
Dear author:
I run this command:
python scripts/play.py --task=humanoid_ppo --run_name v1
But it doesn't seem to be running correctly.
Can you tell me how to solve this problem?
-
### Proposal
Currently, there are only 2 datasets for [discrete](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete)-action envs:
- [Fourrooms](https://minari.farama.…
-
Hello there,
First, I'd like to express my appreciation for your excellent work on this project.
While experimenting with PPO/RW using this repository, I consistently encounter Out of Memory (OOM) e…
-
Hi, what is the best way to implement action constraints in a PPOAgent?
For a `QPolicy` i can use `observation_and_action_constraint_splitter`. Is there something equivalent for ppo policies?
-
I'm trying to implement a PPO agent to play with LunarLander-v2 with tf_agents library like it was in [this tutorial](https://pylessons.com/LunarLander-v2-PPO/) ([_github repo_](https://github.com/pyt…
-
你好,莫烦老师,我在运行simple_ppo算法中,,根据当前状态选择一个动作 a=self.sess.run(self.sample_op,{self.tfs:s})[0],,选择出来的动作为nan,,我应该如何修改,才能在运行代码过程中不在出现nan值,