Revisiting Recurrent Reinforcement Learning with Memory Monoids provides a method to combine recurrent models with standard, nonrecurrent RL losses. This should provide support for S5, LRU, FFM, Linear Transformer, etc recurrent models in stoix. Note that models like LSTM or GRU would be intractable under this paradigm, and would require significantly more effort to integrate into stoix.
Proposal
Implement an abstract base class for recurrent models, with map_to_recurrent_state, map_from_recurrent_state, parallel_recurrent_update, initial_recurrent_state, and identity_element methods.
Implement a general-purpose episodic reset operator using initial_recurrent_state and identity_element methods
Implement one or more of S5, LRU, FFM based on the base class
Create a make_recurrent_loss function that wraps a non-recurrent loss function
This method will scan over one long contiguous series of observations from the replay buffer, produce Markov states, and feed Markov states into the wrapped non-recurrent loss function
Demonstrate sequence model + DQN on one or two POPGym tasks
Testing
Show that it can solve a few POPGym tasks
Benchmarking (Optional)
Definition of done
We have an example that can solve a few POPGym tasks
Mandatory checklist before making a PR
[ ] The success criteria laid down in “Definition of done” are met.
[ ] Code is documented - docstrings for methods and classes, static types for arguments.
[ ] Code is tested - unit, integration and/or functional tests are added.
[ ] Documentation is updated - README, CONTRIBUTING, or other documentation.
[ ] All functional tests are green.
[ ] Link experiment/benchmarking after implementation (optional).
Feature
Revisiting Recurrent Reinforcement Learning with Memory Monoids provides a method to combine recurrent models with standard, nonrecurrent RL losses. This should provide support for S5, LRU, FFM, Linear Transformer, etc recurrent models in stoix. Note that models like LSTM or GRU would be intractable under this paradigm, and would require significantly more effort to integrate into stoix.
Proposal
map_to_recurrent_state, map_from_recurrent_state, parallel_recurrent_update, initial_recurrent_state
, andidentity_element
methods.initial_recurrent_state
andidentity_element
methodsmake_recurrent_loss
function that wraps a non-recurrent loss functionTesting
Show that it can solve a few POPGym tasks
Benchmarking (Optional)
Definition of done
We have an example that can solve a few POPGym tasks
Mandatory checklist before making a PR
Links / references / screenshots