-
I followed the [complete example](https://github.com/eric-mitchell/direct-preference-optimization#a-complete-example) in the readme and got the error: `torch.multiprocessing.spawn.ProcessRaisedExcepti…
-
I'm training tinyllama with 8 A40s.
Everything goes very smooth until I want to increase the micro batch size for better computation to communication ratio.
I follow the official tutorial of lit …
-
Hi Ilya,
First of all thanks for sharing your code. It has been very useful to me lately. This is more of a question rather than an issue:
When you update the recurrent policy, how many steps ar…
-
This is a question/feature request for policy gradient based methods (e.g. A2C). Is it possible to specify a prior for the policy before training?
For instance, if I have 3 possible discrete Actio…
-
### Question
As far as I know, the gym vector environment auto-reset a subenv when the env is done. I wonder If there is a way to manually reset it. Because I want to exploiting vecenv feature in i…
-
# Description
For my thesis project, I'm applying a novel Polyak-averaging approach to various reinforcement learning algorithms; the approach uses natural-gradient descent in order to estimate the…
-
Hi, I'm not sure if it would calculate the gradient of the action-value with respect to actions?
policy_loss = -self.critic([
to_tensor(state_batch),
self.actor(to_tensor…
-
Can you please add a `requirements.txt` or similar?
I'm trying to run this and python throws an error because it doesn't know the library `policy_gradient`. I can't find any similarly named library…
-
刘老师您好,我是一名多智能体强化学习的初学者,在尝试运行您发布的xuance框架时,选择的是tensorflow+gpu,遇到了如下报错,
Traceback (most recent call last):
File "C:\Users\50\Desktop\MADRLTEST\main.py", line 5, in
is_test=False)
File "F:\a…
-
https://blog.oliverxu.cn/2020/08/27/%E4%BD%BF%E7%94%A8PPO%E8%AE%BE%E8%AE%A1%E7%BA%BF%E6%80%A7%E7%B3%BB%E7%BB%9F%E6%8E%A7%E5%88%B6%E5%99%A8/
论文《Policy Iteration Adaptive Dynamic Programming Algorith…
olixu updated
3 years ago