DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.6k stars 1.64k forks source link

Prioritized Experience Replay for DQN #1242

Open vnvdev opened 1 year ago

vnvdev commented 1 year ago

🚀 Feature

Prioritized Experience Replay for DQN

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

qgallouedec commented 1 year ago

It's planned, contributions are welcome 🙂

araffin commented 1 year ago

See https://github.com/DLR-RM/stable-baselines3/issues/622

AlexPasqua commented 1 year ago

@araffin @qgallouedec Hello, are there any news on prioritized experience replay, or you're still waiting for contributions?

araffin commented 1 year ago

or you're still waiting for contributions?

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

emrul commented 1 year ago

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

araffin commented 1 year ago

How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

not sure, I need to take a deeper look, but probably once for all if possible or whatever is cleaner/fast enough. We might need to do something similar to: https://github.com/DLR-RM/stable-baselines3/pull/704

mkhlyzov commented 1 year ago

Hi @araffin - just looking at this. How would you go about it in relation to the vectorised replay buffer that SB3 uses: have one segment tree hold priorities across all envs or have a segment tree per environment? I had a cursory look at how Tianshou does it and it appears to be a segment tree per environment (at least at first glance).

I think it might matter depending on replacement strategy. Do you override the latest observation or the one with lowest priority? What happens if VecEnv holds different environments? E.g. LunarLander with different gravity / wind parameters. If one environment is significantly more difficult compared to others, then wouldn't joint buffer be skewed toward it? "Hard overall" observations vs "hard for each on average" observations. It's more of a theoretical question though.

AlexPasqua commented 1 year ago

We are welcoming contributions =) I guess adapting https://github.com/Howuhh/prioritized_experience_replay from @Howuhh would be a good contribution.

Hello @araffin, since I've recently used and contributed to @Howuhh 's PER implementation, and since I'm also familiar with SB3 (having contributed before), I could work on its adaptation for this library! (and maybe @Howuhh wants to join as well?)

Howuhh commented 1 year ago

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

AlexPasqua commented 1 year ago

@AlexPasqua even though I think it's very important, I'm unfortunately busy integrating Minari to CORL at the moment, so I'm unlikely to find the time to do it. But I'm glad if my implementation will be useful!

Alright, no problem, I'll do it myself :)