-
As in the title.. I spent a bit of time debugging it but haven't figured out the cause yet. E.g. running
```
tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full fsdp_cpu_…
-
I am a little confused in the implementation of spinningup's SAC. In spinningup's tutorial, SAC's runs a policy ascent to maximize the (Q(a) + log(p(a))), but when I read the code, I find that the co…
-
Dear Mr.hongzi
I was interested in your resource scheduling method. Now, I stuck in your network class. I can't understand why you used the blow function:
`loss = T.log(prob_act[T.arange(N), actions…
-
## Abstract
#### Problem
- GAN has considerable success in generating real-valued data. **However, it has limitations when the goal is for generating sequences of discrete tokens.**
1. the discre…
hon9g updated
5 years ago
-
- https://arxiv.org/abs/1609.05473
- 2017 AAAI
識別モデルを用いて生成モデルを学習するGenerative Adversarial Nets (GAN)は、生成モデルを学習する新しい方法として、実値データの生成に大きな成功を収めている。
しかし、離散的なトークンのシーケンスを生成することを目的とした場合には、限界があります。
その主な理由は…
e4exp updated
3 years ago
-
https://arxiv.org/abs/1611.01626
-
Hi Brainxyz, I am a PHd candidate/visiting scholar now majoring in Music technology in Georgia Tech. Your project inspires me a lot. It is very interesting to investigate GA in DRL. But I am new in th…
-
[paper](https://arxiv.org/pdf/1707.06347)
## TL;DR
- **I read this because.. :** 배경지식 차
- **task :** RL
- **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…
-
In the results of Chapter 14 Deterministic policy gradients in the book,
why the training is not very stable and noisy?
-------------------
![擷取](https://user-images.githubusercontent.com/475557…
-
Hi author, your code is just wonderful and it help me a lot on building deep reinforcement learning system for my project. But I found a mistake in the following code. When you print out steps,
> …