How to contribute? - Githubissues

mhtb32 commented 4 years ago

I wanted to know if a contribution is welcomed here, and if it is, how to contribute? I mean, is there any guideline for how we should implement agents?

In fact, I wanted to implement agents like DDPG, SAC, and TD3. Are these in the scope of this project?

eleurent commented 4 years ago

Contributions are absolutely welcome, and guidelines are unfortunately lacking. I'll try to address this. The main steps would be:

Inherit from AbstractAgent
You must in particular implement the act() method, which takes an observation and returns an action, or the plan() method which returns a sequence of actions.
You can also implement a record() method, which allows to record a transition and update an internal model.
The agent configuration parameters are hold in the config field. You should define default values by overriding the default_config() method. These values can then be overriden with a json config file, used to run experiments.py.
For debug, you can log some internal values in the tensorboard writer
For some supported environments (e.g. highway-env), the agents can also be rendered on a pygame surface. You can implement it in a graphics.py file, and use e.g. matplotlib plots.

You can have a look at the implementation DQN for guidance (which is split between abstract.py and pytorch.py files for historical and deprecated reasons)

Policy gradient algorithms are definitely in the scope of this project, I wanted to implement a few myself (see #4) but never found the time. You're welcome to try it and I'll provide any support you require.

You should also know that there are two kind of ways to train agents, defined in the Evaluation class. The default one (run_episodes) is the following:

for episode in episodes:
    action = agent.act(state)
    next_state, reward, done, info = env.step(action)
    agent.record(state, action, next_state, reward, done, info)

But alternatively, the (run_batched_episodes) method allows to run a batch of sample collection jobs in parallel, before updating the model. Something like

for episode in episodes:  # in parallel
    action = agent.act(state)
    next_state, reward, done, info = env.step(action)
    agent.record(state, action, next_state, reward, done, info)
agent.update()

This is probably relevant for policy gradient algorithms.

mhtb32 commented 4 years ago

Thanks for your explanation. I'll keep this issue open so it can act as a temporary contribution guide for others.

eleurent / rl-agents

How to contribute? #41