-
### ❓ Question
Hello.
I would like to ask if I have a finite MDP, where each episode has a same fixed timestep \$T$\. Then during the training, do I have to choose batch size with \$n\times T$\? O…
-
### What happened + What you expected to happen
When the user needs to execute the DDPG algorithm,the current DDPGTrainer can only support the single-machine version of the algorithm. If user needs t…
-
After Exception happens, for whatever reason, the `ep_raw_mean` and `ep_len_mean` are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a mor…
-
I do understand the backpropagation in policy gradient networks, but am not sure how your code work keras's auto-differentiation.
That is, how you transform it into a supervised learning problem.
…
-
I'm running several nested bars. The inner bars may be run many times for each step of the outer bars. When an inner bar completes, it "stays around", and a new version of the bar pops up underneath.
…
-
Hello,
Is there any benefit to having a vanilla REINFORCE algorithm for people trying to learn the concepts? REINFORCE with Baseline includes a value function approximator which has a lot of simila…
-
I am referring to the gradient derivation [here](https://huggingface.co/learn/deep-rl-course/unit4/pg-theorem#optional-the-policy-gradient-theorem).
The paragraph where the instructor claimed "we c…
-
I tried to reproduce the [complete example](https://github.com/eric-mitchell/direct-preference-optimization/blob/main/README.md#a-complete-example) on a Hyperstack cloud machine (A100-80G-PCIe, OS Ima…
-
I am getting the following error
`(CompileError) deps/axon/lib/axon/loop.ex:469: the do-block in while must return tensors with the same shape, type, and names as the initial arguments.`
While t…
-
when run the agent.py , There was an error and I didn't debug it
Could you give me some advice? Thank you
Traceback (most recent call last):
File "agent.py", line 159, in
main()
Fi…