JuliaReinforcementLearning / ReinforcementLearningZoo.jl

https://juliareinforcementlearning.org/
MIT License
52 stars 18 forks source link

Add policy gradient #80

Closed norci closed 3 years ago

norci commented 4 years ago

Doc:

  1. https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf

Implementations:

  1. https://github.com/PaddlePaddle/PARL/blob/develop/parl/algorithms/torch/policy_gradient.py

  2. https://github.com/thu-ml/tianshou/blob/master/tianshou/policy/modelfree/pg.py

norci commented 4 years ago

See also: https://arxiv.org/abs/1808.09940 "We also conduct intensive experiments in China Stock market and show that PG is more desirable in financial market than DDPG and PPO, although both of them are more advanced. "

I think PG is useful in some circumstances, although it's simpler than DDPG.

I tried to implement PG in this framework, but it is too complex for me.

findmyway commented 4 years ago

I've implemented it before. But that's a long time ago. I'll take a look at it this weekend.

findmyway commented 4 years ago

I'm sorry that I may not have enough time to work on this issue until the next month due to some personal issues.

I'll explain the implementation details here in case you'd like to try this by yourself:

First, we may need to have an ElasticCompact**Trajectory like structure for efficiency. (Added in RLCore)

Then the rest is simple, we update the policy only at the end of an episode, randomly select different batches and update the inner approximator. You are encouraged to read the implementation of PPO for similar implementation.

Ideally, the REINFORCE algorithm should also support tabular cases, https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl/blob/master/src/extensions/policies/reinforce_policy.jl . But we can make it into another PR.

By the way, if you need the implementation similar to the one in openai/spinningup, then you need to dispatch the update! method based on the type of the inner approximator.

norci commented 4 years ago

thanks for your guidance. I'd like to implement this algo, and this project is awesome, I like it. But I'm a beginner in Julia & RL, so give me some time.

findmyway commented 4 years ago

Cool! No hurry.

I just tagged a new release of RLCore@v0.4.3. Now you should have all the necessary components to implement it.