Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
300 stars 29 forks source link

Feature/trajectory buffer with priority #86

Closed DavideTr8 closed 11 months ago

DavideTr8 commented 12 months ago

Summary

Add a buffer where each item is a Trajectory and the sample method returns chunks of trajectories. If the weights of each sample is provided, sampling takes into account those weights for selecting the trajectories from which extracting the chunk and the starting position of the chunk.

Type of Change

Checklist

Please confirm that the following tasks have been completed:

Additional Information (Optional)

To be later used with MuZero. We might check if the EpisodeBuffer and the TrajectoryBuffer might be merged in a single buffer.

belerico commented 11 months ago

Since this is almost equal to the EpisodeBuffer we have already implemented, maybe we can discuss a way to merge the two! :sheep: