LLNL / Abmarl

Agent Based Modeling and Reinforcement Learning
Other
56 stars 16 forks source link

Get the q learning algorithm from the dssi class branch #284

Open rusu24edward opened 2 years ago

rusu24edward commented 2 years ago

Q learning updates the policy during a trajectory. The current trainer framework abstracts the episode generation without the ability to train during the episode.

  1. We can modify the framework so that it calls log_reward with the return. Then, the Q learning trainer can overwrite this and compute_action so that it stores the whole SARS. In either the log_return or the compute_action function, we can implement the back propagation.
  2. We can have the Q trainer modify the generate episode function and do the training at the same time.

The first approach is more modular, and I think we'll want to explore both.