[FEATURE] Add support for efficient recurrent models

Feature

Revisiting Recurrent Reinforcement Learning with Memory Monoids provides a method to combine recurrent models with standard, nonrecurrent RL losses. This should provide support for S5, LRU, FFM, Linear Transformer, etc recurrent models in stoix. Note that models like LSTM or GRU would be intractable under this paradigm, and would require significantly more effort to integrate into stoix.

Proposal

Implement an abstract base class for recurrent models, with map_to_recurrent_state, map_from_recurrent_state, parallel_recurrent_update, initial_recurrent_state, and identity_element methods.
Implement a general-purpose episodic reset operator using initial_recurrent_state and identity_element methods
Implement one or more of S5, LRU, FFM based on the base class
Create a make_recurrent_loss function that wraps a non-recurrent loss function
- This method will scan over one long contiguous series of observations from the replay buffer, produce Markov states, and feed Markov states into the wrapped non-recurrent loss function
Demonstrate sequence model + DQN on one or two POPGym tasks
Testing

Show that it can solve a few POPGym tasks

Benchmarking (Optional)

Definition of done

We have an example that can solve a few POPGym tasks

Mandatory checklist before making a PR

[ ] The success criteria laid down in “Definition of done” are met.
[ ] Code is documented - docstrings for methods and classes, static types for arguments.
[ ] Code is tested - unit, integration and/or functional tests are added.
[ ] Documentation is updated - README, CONTRIBUTING, or other documentation.
[ ] All functional tests are green.
[ ] Link experiment/benchmarking after implementation (optional).

EdanToledo / Stoix

[FEATURE] Add support for efficient recurrent models #54

Feature

Proposal

Testing

Benchmarking (Optional)

Definition of done

Mandatory checklist before making a PR

Links / references / screenshots