-
## Fix the model test for `soft_actor_critic.py`
1. setup env according to [Run a model under torch_xla2](https://github.com/pytorch/xla/blob/master/experimental/torch_xla2/docs/support_a_new_model…
-
Our current baseline RL algorithm is DQN (more accurately it is DDQN). Named algorithm uses epsilon-greedy policies to at least have a chance of fully investigating environment in question. Using epsi…
-
## 概要
soft actor critcを実装する
-
-
# [강화학습] Soft Actor-Critic 논문 리뷰 - 재야의 숨은 초보
[강화학습] Soft Actor-Critic 논문 리뷰
[https://hiddenbeginner.github.io/rl/2022/11/06/sac.html](https://hiddenbeginner.github.io/rl/2022/11/06/sac.html)
-
Hi @AlexKuhnle, sorry for bothering you; I would like to implement the SAC algorithm, and I'm wondering if you have some suggestions for that.
In particular, I have some doubts about the following:…
-
# Actor-Critic Algorithms #
- Author: Vijay R. Konda, John N. Tsitsiklis
- Origin: https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf
- Related:
- PyTorch4 tutorial of: actor critic…
-
@Kismuz,
I believe I have encountered a framework (A3C) limitation.
While training a few of my recent models I noticed a strange behavior. For the first part of training everything seems to work fi…
-
In `obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean()` when alpha_log=0, alpha will be 1forever.
the correct way is `obj_alpha = (self.alpha * (self.target_entropy - log…
-
Thanks for sharing your code, it's great to be able to go through the implementation.
Maybe I'm misunderstanding this, but it seem that if you intend `self.cpc_optimizer` to only optimise W, then
…