Open mhtb32 opened 4 years ago
Contributions are absolutely welcome, and guidelines are unfortunately lacking. I'll try to address this. The main steps would be:
act()
method, which takes an observation and returns an action, or the plan()
method which returns a sequence of actions.record()
method, which allows to record a transition and update an internal model.config
field. You should define default values by overriding the default_config()
method. These values can then be overriden with a json config file, used to run experiments.py.You can have a look at the implementation DQN for guidance (which is split between abstract.py and pytorch.py files for historical and deprecated reasons)
Policy gradient algorithms are definitely in the scope of this project, I wanted to implement a few myself (see #4) but never found the time. You're welcome to try it and I'll provide any support you require.
You should also know that there are two kind of ways to train agents, defined in the Evaluation class. The default one (run_episodes) is the following:
for episode in episodes:
action = agent.act(state)
next_state, reward, done, info = env.step(action)
agent.record(state, action, next_state, reward, done, info)
But alternatively, the (run_batched_episodes) method allows to run a batch of sample collection jobs in parallel, before updating the model. Something like
for episode in episodes: # in parallel
action = agent.act(state)
next_state, reward, done, info = env.step(action)
agent.record(state, action, next_state, reward, done, info)
agent.update()
This is probably relevant for policy gradient algorithms.
Thanks for your explanation. I'll keep this issue open so it can act as a temporary contribution guide for others.
I wanted to know if a contribution is welcomed here, and if it is, how to contribute? I mean, is there any guideline for how we should implement agents?
In fact, I wanted to implement agents like DDPG, SAC, and TD3. Are these in the scope of this project?