-
Why FairseqDecoder get_normalized_probs uses output[0] as input for softmax function? (it results in dimension mismatch for my loss calculation)
I was trying to implement my own model (LSTM based, …
-
I am a new guy to the apex DQN. Thanks for your contribution. Your codes help me to understand this algorithm. please also add the codes of the gather_experience from the Actor class.
```python
…
-
Hello, thank you so much for sharing this code structure! I got one thing no very sure about in your code.
https://github.com/Shivanshu-Gupta/Pytorch-Double-DQN/blob/1cff44d95d7881c6afc029b734508b1…
-
- [ ] I have marked all applicable categories:
+ [ ] exception-raising bug
+ [x] RL algorithm bug
+ [x] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
### Describe your feature request
Hello,
In its current form the action selection in PyTorch uses either `compute_actions_from_dict` or `compute_actions` (the latter creates an input dict). Both…
-
This is rather minor, but polyak averaging in DQN/SAC/TD3 could be done faster with far fewer intermediate tensors using `torch.addcmul_` https://pytorch.org/docs/stable/torch.html#torch.addcmul.
m-rph updated
4 years ago
-
When using rllib and turning on evaluation, rllib tries to convert pytorch cuda tensor to numpy and fails with exception "TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() …
-
I followed instructions from here:
https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md
to run Docker image on Mac. However when I am running the example, getting following …
-
I am trying to implement a deep Q network using pytorch. For sampling the actions I am using the code as shown in the snippet:
```
def select_action(self, state):
if random.uniform(0, 1) …
-
## 🚀 Feature
Implement a dataloading functionality for reinforcement learning state, action pairs, with assigned policy scores, transitional probabilities and rewards.
Implement a set of gradient al…