Open yZ265519 opened 1 year ago
The utilization of random and fixed encoder is inspired by
Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.
Anyway, you can follow the original implementation or create a new one, which depends on your task.
我明白了,非常感谢您的指点!
------------------ 原始邮件 ------------------ 发件人: "yuanmingqi/rl-exploration-baselines" @.>; 发送时间: 2023年2月25日(星期六) 下午2:05 @.>; @.**@.>; 主题: Re: [yuanmingqi/rl-exploration-baselines] RIDE Code issues (Issue #5)
The utilization of random and fixed encoder is inspired by
Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.
The key insight of RIDE is to use the difference between two consecutive states to encourage exploration, and a fixed encoder can provide fixed representations;
A random and fixed encoder can provide a stable reward space;
It is more efficient and easy to train.
Anyway, you can follow the original implementation or create a new one, which depends on your task.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hello! We've published a big update that provides more reasonable implementations of these intrinsic rewrads.
If you have any other questions, please don't hesitate to ask here.
@yZ265519
I noticed that there is no part of training network in the ride code, only two random networks are directly encoded, which seems to be inconsistent with the original paper, why is that?