-
## š Bug
When training models: 'vicuna-7b-v1.5-16k', 'longchat-13b-16k', 'Mistral-7B-v0.2', 'falcon-180B', 'Llama-3-70B', 'CodeLlama-34b-hf' with FSDP and FP8 we get KeyError: 'scaling_fwd'. This mā¦
-
### Describe the bug
Running training for GSA and RWKV will result in NAN gradient occasionally, rare at the beginning stage, but getting more frequent as the training processes.
I checked parameteā¦
-
Hi, I find that before input the tensor into the network, you don't use torch.autograd.Variable to convert a tensor to a Variable. How does this work?
-
I run finetuning on my server and it get error after ~300 iterations.
My run command:
```
torchrun --nproc_per_node 2 \
-m FlagEmbedding.finetune.embedder.encoder_only.m3 \
--model_nameā¦
-
I get this when running loss.backward(): _RuntimeError: 0
-
What is use_autograd?
if I set it to either True or False, the result of constraint_fn() is always expected to be of type numpy.
If the result is of type numpy, how does autograd work?
For autoā¦
-
### š Describe the bug
compute-sanitizer reports many errors like this when running CTC backward pass. The input of CTC (linked below) is not strictly log probs, and target lengths are generally veā¦
-
https://whatasmallship.github.io/2024/06/12/autograd-tutorial/
123# For tips on running notebooks in Google Colab, see# https://pytorch.org/tutorials/beginner/colab%matplotlib inline A Gentle Introā¦
-
### Bug description
Am launching a script taht trains a model which works well when trained without ddp and using gradient checkpointing, or using ddp but no gradient checkpointing, using fabric toā¦
-
In https://github.com/HIPS/autograd/blob/195e8d839d93e2ffe7397f4058925a63fa0f7564/autograd/wrap_util.py#L36 there is a `return` statement in a `finally` block, which would swallow any in-flight exceptā¦