policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtune #1412

Full finetune recipe not working with FSDP2 CPU offload

As in the title.. I spent a bit of time debugging it but haven't figured out the cause yet. E.g. running ``` tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full fsdp_cpu_…

ebsmothers updated 2 months ago
10
openai/spinningup #241

Why does SAC's policy run gradient descent?

I am a little confused in the implementation of spinningup's SAC. In spinningup's tutorial, SAC's runs a policy ascent to maximize the (Q(a) + log(p(a))), but when I read the code, I find that the co…

alexfrom0815 updated 4 years ago
1
hongzimao/deeprm #9

loss function (In Policy Gradient section), optimizer and en…

Dear Mr.hongzi I was interested in your resource scheduling method. Now, I stuck in your network class. I can't understand why you used the blow function: `loss = T.log(prob_act[T.arange(N), actions…

ahmadreza9 updated 3 years ago
12
hon9g/papers__Today-I-Read #6

SeqGAN: Sequence Generative Adversarial Nets with Policy Gra…

## Abstract #### Problem - GAN has considerable success in generating real-valued data. **However, it has limitations when the goal is for generating sequences of discrete tokens.** 1. the discre…

hon9g updated 5 years ago
2
e4exp/paper_manager_abstract #519

SeqGAN: Sequence Generative Adversarial Nets with Policy Gra…

- https://arxiv.org/abs/1609.05473 - 2017 AAAI 識別モデルを用いて生成モデルを学習するGenerative Adversarial Nets (GAN)は、生成モデルを学習する新しい方法として、実値データの生成に大きな成功を収めている。しかし、離散的なトークンのシーケンスを生成することを目的とした場合には、限界があります。その主な理由は…

e4exp updated 3 years ago
8
rl-tokyo/survey #13

SeqGAN: Sequence Generative Adversarial Nets with Policy Gr…

https://arxiv.org/abs/1611.01626

sotetsuk updated 7 years ago
1
hunar4321/genetic-algorithm #2

About the comparson between GA and policy gradient based met…

Hi Brainxyz, I am a PHd candidate/visiting scholar now majoring in Music technology in Georgia Tech. Your project inspires me a lot. It is very interesting to investigate GA in DRL. But I am new in th…

YiBenjing456 updated 2 years ago
1
long8v/PTIR #187

[168] Proximal Policy Optimization Algorithms

[paper](https://arxiv.org/pdf/1707.06347) ## TL;DR - **I read this because.. :** 배경지식 차 - **task :** RL - **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…

long8v updated 3 months ago
1
PacktPublishing/Deep-Reinforcement-Learning-Hands-On #86

Chaper 14 Deterministic policy gradients results are quite …

In the results of Chapter 14 Deterministic policy gradients in the book, why the training is not very stable and noisy? ------------------- ![擷取](https://user-images.githubusercontent.com/475557…

isu10503054a updated 4 years ago
3
dennybritz/reinforcement-learning #138

A mistake in policy gradient for cliff walking REINFORCE

Hi author, your code is just wonderful and it help me a lot on building deep reinforcement learning system for my project. But I found a mistake in the following code. When you print out steps, > …

stonyhu updated 6 years ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient