Feature/trajectory buffer with priority

Summary

Add a buffer where each item is a Trajectory and the sample method returns chunks of trajectories. If the weights of each sample is provided, sampling takes into account those weights for selecting the trajectories from which extracting the chunk and the starting position of the chunk.

Type of Change

New feature (non-breaking change that adds functionality)

Checklist

Please confirm that the following tasks have been completed:

[x] I have tested my changes locally and they work as expected. (Please describe the tests you performed.)
[x] I have added unit tests for my changes, or updated existing tests if necessary.
[x] I have updated the documentation, if applicable.
[x] I have installed pre-commit and run locally for my code changes.

Additional Information (Optional)

To be later used with MuZero. We might check if the EpisodeBuffer and the TrajectoryBuffer might be merged in a single buffer.

Eclectic-Sheep / sheeprl

Feature/trajectory buffer with priority #86

Summary

Type of Change

Checklist

Additional Information (Optional)