[Feature Request] Random Network Distillation with PPO (RND-PPO)

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

MIT License

8.35k stars 1.6k forks source link

🚀 Feature

It would be interesting to integrate Random Network Distillation policies (https://arxiv.org/abs/1810.12894) to be used with PPO.

Motivation

RND implementation on GitHub are scarse and many of them are very old and not practical to use. Having this features inside stable-baseline3 could improve researches in many field.

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

[X] I have checked that there is no similar issue in the repo
[ ] If I'm requesting a new feature, I have proposed alternatives

DLR-RM / stable-baselines3