-
Hello, i think this place in your codes is wrong.
cvpo-safe-rl/safe_rl/policy/cvpo.py
line-454-457
def critic_loss():
obs, act, reward, obs_next, done = to_tensor(data['obs']), to_tens…
-
## Bug description
There are two ways to deal with variable horizon environments. 1) is by turning them into fixed length environments and 2) is by turning them into infinite horizon environments.
…
-
let's say i want to replace the reward of the last batch in the buffer, i tried:
```
buffer[len(buffer)-1].rew = new_reward
```
but it turns out the change didn't apply at all, maybe i did it wron…
-
Note: the issue was created automatically with bugzilla2github tool
Original bug ID: BZ#3225
From: @JasonGross
Reported version: 8.5
CC: @cpitclaudel, @forestjulien, @Matafou
See also: #455…
-
-
```python
Epoch #1: 0%| | 1/5000 [00:00
-
- [X] I have marked all applicable categories:
+ [ ] exception-raising bug
+ [X] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
**High Level Description**
Want to determine what is the cause of early termination of the training process
**Desired SMARTS version**
0.6.1
**Operating System**
Ubuntu 20.04.5 LTS
**Prob…
-
### ❓ Question
It confuses me a lot that using statistics of discounted rewards to rescale another quantity - reward. That seems a default choice for PPO. Is there any intuition for interpreting th…
-
![cmd-20220904-1550-rp9](https://user-images.githubusercontent.com/9929511/188303314-f50a5cb1-8ccf-476c-97e2-eae92d3a7ea2.png)
![Uploading explorer-20220904-1552-REw.png…]()