policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

eric-mitchell/direct-preference-optimization #51

error when following the readme to train sft on multiple car…

I followed the [complete example](https://github.com/eric-mitchell/direct-preference-optimization#a-complete-example) in the readme and got the error: `torch.multiprocessing.spawn.ProcessRaisedExcepti…

NekoMimiUnagi updated 2 weeks ago
2
Lightning-AI/litgpt #805

Weird error when using activation checkpointing for FSDPStra…

I'm training tinyllama with 8 A40s. Everything goes very smooth until I want to increase the micro batch size for better computation to communication ratio. I follow the official tutorial of lit …

RogerChern updated 1 week ago
2
ikostrikov/pytorch-a2c-ppo-acktr-gail #210

Question about backpropagation through time

Hi Ilya, First of all thanks for sharing your code. It has been very useful to me lately. This is more of a question rather than an issue: When you update the recurrent policy, how many steps ar…

miguelsuau updated 3 years ago
10
hill-a/stable-baselines #903

[question] Specify a prior over action distribution?

This is a question/feature request for policy gradient based methods (e.g. A2C). Is it possible to specify a prior for the policy before training? For instance, if I have 3 possible discrete Actio…

juliuskittler updated 4 years ago
4
Farama-Foundation/Gymnasium #831

[Question] Manually reset vector envronment

### Question As far as I know, the gym vector environment auto-reset a subenv when the env is done. I wonder If there is a way to manually reset it. Because I want to exploiting vecenv feature in i…

zzhixin updated 8 months ago
9
pytorch/functorch #733

[Use Case] Natural-gradient Reinforcement Learning

# Description For my thesis project, I'm applying a novel Polyak-averaging approach to various reinforcement learning algorithms; the approach uses natural-gradient descent in order to estimate the…

DarkbyteAT updated 2 years ago
2
ghliu/pytorch-ddpg #7

the gradient of the action-value with respect to actions

Hi, I'm not sure if it would calculate the gradient of the action-value with respect to actions? policy_loss = -self.critic([ to_tensor(state_batch), self.actor(to_tensor…

Joywanglulu updated 3 years ago
1
gabrielgarza/openai-gym-policy-gradient #2

What libraries are required?

Can you please add a `requirements.txt` or similar? I'm trying to run this and python throws an error because it doesn't know the library `policy_gradient`. I can't find any similarly named library…

mdavis-xyz updated 5 years ago
5
agi-brain/xuance #8

NameError: name 'Toy_Env' is not defined

刘老师您好，我是一名多智能体强化学习的初学者，在尝试运行您发布的xuance框架时，选择的是tensorflow+gpu，遇到了如下报错， Traceback (most recent call last): File "C:\Users\50\Desktop\MADRLTEST\main.py", line 5, in is_test=False) File "F:\a…

wulin0108 updated 1 year ago
2
olixu/blog-comment #61

使用PPO设计线性系统控制器 | Oliver xu's Blog

https://blog.oliverxu.cn/2020/08/27/%E4%BD%BF%E7%94%A8PPO%E8%AE%BE%E8%AE%A1%E7%BA%BF%E6%80%A7%E7%B3%BB%E7%BB%9F%E6%8E%A7%E5%88%B6%E5%99%A8/ 论文《Policy Iteration Adaptive Dynamic Programming Algorith…

olixu updated 3 years ago
2

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient