-
I think one of the main motives for LoRA is to reduce memory consumption—certainly that's my motive. I'm already using gradient checkpointing and AdaFactor so the main thing I want from LoRA is to red…
-
Hej,
I'm wondering about the reasoning behind [this constraint](https://github.com/ProbabilisticNumerics/probabilistic_line_search/blob/master/probls/tensorflow_interface/gradient_moment.py#L89-L91…
-
It seems like check_grads has incorrect scaling behaviour for large input tensors.
In the code of [check_jvp](https://github.com/google/jax/blob/master/jax/test_util.py#L204) (which is called by [c…
-
```julia
using ClimaCore
FT = Float64
zmin = FT(1.0)
zmax = FT(2.0)
xlim = FT.((0.0, 10.0))
ylim = FT.((0.0, 1.0))
zlim = FT.((zmin, zmax))
nelements = (1, 1, 5)
npolynomial = 3
domain_x =…
-
This looks like a very useful project and I like the name. One thing I did notice, though, is that it is using older Perlin noise for its noise. Perlin noise is smooth, but it produces a lot of 45 and…
-
Hello, I feel realy confused about the grads of alphas in hard attention. The source code is in line 1199:
known_grads={alphas:opt_outs['masked_cost'][:,:,None]/10.*
(alph…
-
**What's the issue, what's expected?**:
Error when using ms-amp to do llm sft.
ms-amp deepspeed config:
"msamp": {
"enabled": true,
"opt_level": "O1|O2|O3", # all tried
"use_te": false
}
…
-
#### The next comment explains what this issue aims at
When `thunder.jit` trace `optimizer._init_group` before this initialization, calls of the jitted `optimizer.step` will all reset the inner st…
-
Hi Ax Team,
I am trying to implement a Service API version of the safe optimization idea floated by @Balandat [here](https://github.com/pytorch/botorch/discussions/2240#discussioncomment-8701003);…
-
**Describe the bug**
As illustrated below,DeepSpeed's overlap buffer design presents potential data race.
I have write a patch for bugfix.
Could you kindly help diagnosing and fix this issue?
…