policy-gradient Search Results

1000+ results
for policy-gradient

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/Megatron-LM #937

[BUG]Get an AtrributeError when trying to finetune llama3-8B…

**Describe the bug** I try to finetune `llama3-8B` model with multi nodes but get an AtrributeError when finishing loading mcore format checkpoint and starting to build datasets, the error is below: …

nakroy updated 2 months ago
5
denizyuret/Knet.jl #564

scheduler interface

For reference PyTorch interface: ```Python optimizer = optim.SGD(net.parameters(), lr=opt.learning_rate, momentum=0.9, weight_decay=1e-4) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step…

ekinakyurek updated 4 years ago
1
official-stockfish/fishtest #535

SPSA improvements [RFC]

Issue opened to collect info about possible future SPSA improvements. ### SPSA references SPSA is a fairly simple algorithm to be used for local optimization (not global optimization). The wiki h…

ppigazzini updated 1 year ago
427
minh-swinburne/cos30045-project #10

Explore Visualization Ideas and Techniques

- Brainstorm different types of charts and visualizations. - Research examples of interactive visualizations for inspiration.

minh-swinburne updated 2 weeks ago
5
ethz-asl/ethzasl_ptam #91

Error compiling

Hi there! I've tried to compile ethzasl_ptam but I got some errors in the process. I've followed the steps given in #39 and #48 , but without better results. I'm trying to build it in a Raspberry P…

SergioGarG updated 7 years ago
1
lightvector/KataGo #194

katago board sizes

This is not an issue but a question about how katago handles different board sizes. Please feel free to move it or direct me to where to post the question if it can't stay here. It seems that in t…

dshawul updated 4 years ago
7
bitsandbytes-foundation/bitsandbytes #1232

"Only Tensors of floating point and complex dtype can requir…

### System Info Python 3.11.5 torch 2.3.0 transformers 4.41.1 accelerate 0.30.1 ``` +----------------------------------…

artkpv updated 1 week ago
6
JuliaGaussianProcesses/KernelFunctions.jl #504

Efficient AD for the simulation of gradients

## Introduction: How would you simulate gradients If you want to simulate the gradient of a random function $Z$, it turns out that you simply need to take derivatives of the covariance funcion, as …

FelixBenning updated 1 year ago
6
dynamik1703/gym_longicontrol #3

Extract SHAP values from TQC, PPO trained model

Hi, i have a question about the code: why all the training was done using SAC (i mean inside main.py file) and then in LongiControl_SHAP.ipynb a ddpg session was initialized? Is not possible to appl…

SimoMaestri updated 2 years ago
4
Stable-Baselines-Team/stable-baselines3-contrib #101

[Feature Request] MaskableRecurrentPPO

**Motivation** MaskablePPO is great for large discrete action space that has many invalid actions at each step, while RecurrentPPO is useful for the agent to has a memory of previous observations and…

CppMaster updated 3 months ago
15

上一页 1...82 83 84 85 86 87 88...100 下一页

1000+ results for policy-gradient

1000+ results
for policy-gradient