MrSyee / pg-is-all-you-need

Policy Gradient is all you need! A step-by-step tutorial for well-known PG methods.
MIT License
859 stars 119 forks source link

Add contents on README.md #12

Closed MrSyee closed 5 years ago

MrSyee commented 5 years ago

@Curt-Park @mclearning2 Thank you very much for your hard works. I add contents on README.md and remove REINFORCE. In addition, I add the contributor's profile on README.md in this PR.

Thank you!

MrSyee commented 5 years ago

@all-contributors add MrSyee for code, documentation

allcontributors[bot] commented 5 years ago

@MrSyee

I've put up a pull request to add @MrSyee! :tada:

MrSyee commented 5 years ago

@all-contributors add @Curt-Park for code, documentation @all-contributors add @mclearning2 for code, documentation

allcontributors[bot] commented 5 years ago

@MrSyee

I could not determine your intention.

Basic usage: @all-contributors please add @jakebolam for code, doc and infra

For other usages see the documentation

MrSyee commented 5 years ago

@all-contributors add @Curt-Park for code, documentation

allcontributors[bot] commented 5 years ago

@MrSyee

I've put up a pull request to add @Curt-Park! :tada:

MrSyee commented 5 years ago

@all-contributors add @mclearning2 for code, documentation

allcontributors[bot] commented 5 years ago

@MrSyee

I've put up a pull request to add @mclearning2! :tada:

Curt-Park commented 5 years ago
  1. the cell location of setting random-seed is different; PPO defines the seed below imports. We should match them up.
  2. This is a step-by-step PG algorithm tutorials from A2C to SAC. In addtion, it contains PG algorithms using human demonstrations (like DDPGfD, BC) for treating real applications with a sparse reward. => This is a step-by-step tutorial for Policy Gradient algorithms from A2C to SAC, including learning acceleration methods using demonstrations for treating real applications with sparse rewards.
  3. Is this needed in References?
    • R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
Curt-Park commented 5 years ago

Contents start from 1, but References start from 0.

ps. I think next Monday is a good day to release.

Curt-Park commented 5 years ago

When running on the CuDNN backend, two further options must be set for reproducibility:

    if torch.backends.cudnn.enabled:
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True

See https://pytorch.org/docs/stable/notes/randomness.html#cudnn

MrSyee commented 5 years ago

Contents start from 1, but References start from 0.

ps. I think next Monday is a good day to release.

@Curt-Park Ok. I release it then. Btw, I intended 1-7 of references match 1-7 of implements and 0 of reference is a basic reference. Is it weird?

Curt-Park commented 5 years ago

@MrSyee We should consider people can generally understand the notation without notice. At least not for me. Have we used the reference 0 for any implementation to be released? I remember we added it due to Reinforce, but Reinforce will not be released.

MrSyee commented 5 years ago

@Curt-Park I remember, but I thought that I needed reference 0, not just REINFORCE. Because it has PG method include REINFORCE, Actor critic. Nevertheless, it doesn't matter if it is removed if it feels weird.

Curt-Park commented 5 years ago

I don't think so because we referred A3C for A2C.

Plus, we don't need to add reference1 as well and should add TD3's reference.

MrSyee commented 5 years ago

Oh. It's my mistake. I add TD3 reference.