grads Search Results - Githubissues

1000+ results
for grads

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

patrick-kidger/quax #28

LoRA that doesn't require memory for zero gradients of the u…

I think one of the main motives for LoRA is to reduce memory consumption—certainly that's my motive. I'm already using gradient checkpointing and AdaFactor so the main thing I want from LoRA is to red…

colehaus updated 1 week ago
6
MethodsOfMachineLearning/probabilistic_line_search #1

Reasoning behind constraint in `grads_and_grad_moms`

Hej, I'm wondering about the reasoning behind [this constraint](https://github.com/ProbabilisticNumerics/probabilistic_line_search/blob/master/probls/tensorflow_interface/gradient_moment.py#L89-L91…

aomader updated 7 years ago
4
jax-ml/jax #3130

Scaling behaviour of jax.test_util.check_grads

It seems like check_grads has incorrect scaling behaviour for large input tensors. In the code of [check_jvp](https://github.com/google/jax/blob/master/jax/test_util.py#L204) (which is called by [c…

nsavinov updated 3 years ago
7
CliMA/ClimaCore.jl #994

Spatially varying BC dont work for grads?

```julia using ClimaCore FT = Float64 zmin = FT(1.0) zmax = FT(2.0) xlim = FT.((0.0, 10.0)) ylim = FT.((0.0, 1.0)) zlim = FT.((zmin, zmax)) nelements = (1, 1, 5) npolynomial = 3 domain_x =…

kmdeck updated 1 year ago
3
cppietime/Fractorizer #1

Try Simplex noise

This looks like a very useful project and I like the name. One thing I did notice, though, is that it is using older Perlin noise for its noise. Perlin noise is smooth, but it produces a lot of 45 and…

KdotJPG updated 4 years ago
4
kelvinxu/arctic-captions #18

question about grads of alphas in hard Attention

Hello, I feel realy confused about the grads of alphas in hard attention. The source code is in line 1199: known_grads={alphas:opt_outs['masked_cost'][:,:,None]/10.* (alph…

denglixi updated 5 years ago
4
Azure/MS-AMP #180

AttributeError: 'ScalingTensor' object has no attribute 'vie…

**What's the issue, what's expected?**: Error when using ms-amp to do llm sft. ms-amp deepspeed config: "msamp": { "enabled": true, "opt_level": "O1|O2|O3", # all tried "use_te": false } …

LSC527 updated 2 months ago
3
Lightning-AI/lightning-thunder #1057

Workaround for Adam.step's lazy initialization of its inner …

#### The next comment explains what this issue aims at When `thunder.jit` trace `optimizer._init_group` before this initialization, calls of the jitted `optimizer.step` will all reset the inner st…

shino16 updated 3 weeks ago
5
facebook/Ax #2563

Safe optimization in the Service API

Hi Ax Team, I am trying to implement a Service API version of the safe optimization idea floated by @Balandat [here](https://github.com/pytorch/botorch/discussions/2240#discussioncomment-8701003);…

Abrikosoff updated 2 months ago
5
microsoft/DeepSpeed #5545

[BUG] deepspeed overlap_comm data race

**Describe the bug** As illustrated below，DeepSpeed's overlap buffer design presents potential data race. I have write a patch for bugfix. Could you kindly help diagnosing and fix this issue? …

yangyihang-bytedance updated 2 months ago
2

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for grads

1000+ results
for grads