Dooders / Experiments

Repository for all specific experiments and tests
0 stars 0 forks source link

Feature: Efficient Experience Replay Sampling with Prioritized Experience Replay #24

Open csmangum opened 1 week ago

csmangum commented 1 week ago

Currently, our MoveModule class uses a basic experience replay mechanism where transitions are sampled uniformly from the memory buffer. While this approach has been effective, it does not account for the varying significance of experiences, which may lead to less efficient learning.

Feature Request:

Implement Prioritized Experience Replay to optimize the sampling of experiences. By prioritizing transitions with higher temporal-difference (TD) errors, we can focus training on experiences that are more impactful, leading to faster and potentially more stable learning.

Proposed Solution:

  1. TD Error Calculation: After each update, calculate the TD error for each transition in the buffer and store this error along with the transition.
  2. Prioritized Sampling: Use the TD errors to assign a probability of selection to each experience, ensuring experiences with larger errors are more likely to be selected.
  3. Proportional or Rank-Based Prioritization: Consider implementing either proportional prioritization (where experiences are weighted based on the magnitude of TD error) or rank-based prioritization (where experiences are ranked by error and sampled accordingly).
  4. Adjustable Hyperparameters: Add parameters such as alpha (controls the degree of prioritization) and beta (controls the amount of importance-sampling correction) to fine-tune the sampling distribution.

Benefits:

Potential Challenges:

Additional Context:

Prioritized Experience Replay was introduced in the paper "Prioritized Experience Replay" by Schaul et al., which highlights its benefits for Deep Q-Learning.


Acceptance Criteria: