-
Hi,
I have unit tests for `Convolution1DReparameterization` , `Convolution1DReparameterization` etc. that basically look like this
```
x = tf.ones(shape = [150,1])
y = tf.ones(shape = [150])
…
-
Very helpful repo!
One question, in the `forward` function in `critic.py`, there might possibly be an error:
In line `37`, the `Decoder` always takes in the same initial `dec_input` for each city in…
-
### 🚀 Feature
Hello,
in accordance with DLR-RM/stable-baselines3#1624, @SimRey and I would like to implement **Hybrid PPO** in this library.
[This](https://arxiv.org/pdf/1903.01344.pdf) is the pa…
-
Here are my situation:
1. finished step 2 with cohere/zhihu_query dataset. The final reward score is 5.07, rejected score is 0.8, and the acc is 0.79. So the step 2 seems sucessful.
2. when I atte…
-
Even though my local copy of repository is up to date I am encountering this error. Log is below. Last line of the log shows the command I run with all the options.
Epoch: 0 | Step: 75 | PPO Epoch:…
-
I wanted to test an architecture for PPO2 where the actor and critic share the hidden layers, but the actor's output layer has a `tanh` activation function instead of the default linear one. If I spec…
-
The following already works with a 'gym':
https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py
jjfiv updated
3 years ago
-
Hi, thanks for open-sourcing your amazing work!
I have been trying to reproduce the RL fine-tuned results reported in the paper, but unfortunately, I am encountering some issues. Here is a brief o…
-
[Soft Actor-Critic for Discrete Action Settings](https://arxiv.org/abs/1910.07207v1)
-
#### Testes iniciais
- [x] Escolher um repositório de Mario Kart 64 para base de comparação
- [x] Testar e ver o funcionamento do repositório
- [x] Estudar o repositório (coleta de parâmetros usados)…