-
## 🐛 Bug
As the original paper(https://arxiv.org/pdf/1711.05101.pdf, green boxes) shows
the formula of applying weight decay to Adam should be
`\theta_t = (1 - \lambda) * \theta_{t - 1}` …
-
when i set the max_steps property in TrainingArguments to a number N, i see in the train logs that it iterates untill 2*N, which was not the case when doing trainer.train(). I will look further if th…
-
### System Info
- `transformers` version: 4.42.3
- Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.…
-
Multipart support was never added to the stub: https://github.com/softwaremill/tapir/blob/abeb5d72e4e7a16c4da3830a59eb58862dfda69b/server/sttp-stub-server/src/main/scala/sttp/tapir/server/stub/SttpReq…
-
omegaconf.errors.ConfigAttributeError: Missing key AdamW
full_key: AdamW
object_type=dict
-
### 🚀 The feature, motivation and pitch
I would like to benefit from the speed advantages of fused-adamw while doing CPU only training, but this is not supported. It currently throws an error indicat…
Epliz updated
11 months ago
-
we use our data to finetune with pretrained dtu model, our data used the same format as DTU.
but we get some error when training
-- Process 0 terminated with the following error:
Traceback (mos…
-
`RuntimeError: params, grads, exp_avgs, and exp_avg_sqs must have same dtype, device, and layout`
-
## Issue
Encountered a deadlock while running a JAX-based LLM training script on a TPU-v4-32 pod. SSH'd into worker 0 and ran the script there directly, instead of using `--worker all --command "..."…
-
I tried to pass --optim , nothing happens. How can I use optimizers such as adamw_8bit or adafactor in LISA?
They are not in the custom_optimizers either.