RLE-Foundation / RLeXplore

RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).
https://docs.rllte.dev/
MIT License
333 stars 16 forks source link

RIDE Code issues #5

Open yZ265519 opened 1 year ago

yZ265519 commented 1 year ago

I noticed that there is no part of training network in the ride code, only two random networks are directly encoded, which seems to be inconsistent with the original paper, why is that?

yuanmingqi commented 1 year ago

The utilization of random and fixed encoder is inspired by

Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.

  1. The key insight of RIDE is to use the difference between two consecutive states to encourage exploration, and a fixed encoder can provide fixed representations;
  2. A random and fixed encoder can provide a stable reward space;
  3. It is more efficient and easy to train.

Anyway, you can follow the original implementation or create a new one, which depends on your task.

yZ265519 commented 1 year ago

我明白了,非常感谢您的指点!

------------------ 原始邮件 ------------------ 发件人: "yuanmingqi/rl-exploration-baselines" @.>; 发送时间: 2023年2月25日(星期六) 下午2:05 @.>; @.**@.>; 主题: Re: [yuanmingqi/rl-exploration-baselines] RIDE Code issues (Issue #5)

The utilization of random and fixed encoder is inspired by

Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.

The key insight of RIDE is to use the difference between two consecutive states to encourage exploration, and a fixed encoder can provide fixed representations;

A random and fixed encoder can provide a stable reward space;

It is more efficient and easy to train.

Anyway, you can follow the original implementation or create a new one, which depends on your task.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yuanmingqi commented 4 months ago

Hello! We've published a big update that provides more reasonable implementations of these intrinsic rewrads.

If you have any other questions, please don't hesitate to ask here.

@yZ265519