policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

DLR-RM/stable-baselines3 #2024

[Question] Batch Size Selection for a Finite MDP

### ❓ Question Hello. I would like to ask if I have a finite MDP, where each episode has a same fixed timestep \$T$\. Then during the training, do I have to choose batch size with \$n\times T$\? O…

DavidLudl updated 3 weeks ago
4
ray-project/ray #24536

[Rllib] Lack validation for "num_workers" parameter in DDPGT…

### What happened + What you expected to happen When the user needs to execute the DDPG algorithm,the current DDPGTrainer can only support the single-machine version of the algorithm. If user needs t…

fantow updated 1 month ago
1
andreped/super-ml-pets #24

Strange behaviour after continuing training after crash

After Exception happens, for whatever reason, the `ep_raw_mean` and `ep_len_mean` are much higher than usual. Are we properly reseting the environment before restarting the training? Or is there a mor…

andreped updated 1 year ago
2
keon/policy-gradient #4

Loss function/Labels for neural network used?

I do understand the backpropagation in policy gradient networks, but am not sure how your code work keras's auto-differentiation. That is, how you transform it into a supervised learning problem. …

abhigenie92 updated 5 years ago
2
alphatwirl/atpbar #9

Make bars stay in fixed order, replacing themselves on resum…

I'm running several nested bars. The inner bars may be run many times for each step of the outer bars. When an inner bar completes, it "stays around", and a new version of the bar pops up underneath. …

jbuckman updated 4 years ago
1
dennybritz/reinforcement-learning #200

Vanilla REINFORCE implementation

Hello, Is there any benefit to having a vanilla REINFORCE algorithm for people trying to learn the concepts? REINFORCE with Baseline includes a value function approximator which has a lot of simila…

alek5k updated 1 year ago
2
huggingface/deep-rl-class #495

[QUESTION] How P(τ;θ) disappeared while estimating the gradi…

I am referring to the gradient derivation [here](https://huggingface.co/learn/deep-rl-course/unit4/pg-theorem#optional-the-policy-gradient-theorem). The paragraph where the instructor claimed "we c…

ritwikmishra updated 8 months ago
2
eric-mitchell/direct-preference-optimization #91

When trying to reproduce the complete example, "NotImplement…

I tried to reproduce the [complete example](https://github.com/eric-mitchell/direct-preference-optimization/blob/main/README.md#a-complete-example) on a Hyperstack cloud machine (A100-80G-PCIe, OS Ima…

ZSvedic updated 4 days ago
1
elixir-nx/axon #568

Compile Error due to a type mismatch

I am getting the following error `(CompileError) deps/axon/lib/axon/loop.ex:469: the do-block in while must return tensors with the same shape, type, and names as the initial arguments.` While t…

Mostafa86 updated 7 months ago
2
XuhanLiu/DrugEx #5

AttributeError: 'int' object has no attribute 'predict_proba…

when run the agent.py , There was an error and I didn't debug it Could you give me some advice? Thank you Traceback (most recent call last): File "agent.py", line 159, in main() Fi…

a919224757 updated 5 years ago
2

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient