-
### System Info
- `transformers` version: 4.43.0.dev0
- Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.4
- Safetensors version: …
-
Hello Thanks for your great work.
I need to use gradient accumulation on batches due to RAM constraints. The training loop involves iterating over two modalities. I am concerned about the implicatio…
-
I tested llama3 continue training with multi-machine tp4 pp2 dp2. If I enabled grad accum operation, the training would hang. The experimental environment is: 16H800 torch 2.1.2+cu121.
checkpoints:…
-
### Description
We're currently experiencing an intermittent issue in our Kubernetes v1.25.7 Kops cluster. Overtime, containerd accumulates `containerd-shim-runc-v2` processes until PID exhaustion o…
-
Dropout is really the bane of equinox it seems. Loose follow-up of #681 - effectively, I'm trying to fix this problem that cropped up a while ago when using `optax.MultiSteps` for gradient accumulatio…
-
### Describe the bug
AdvancedSeasons 1.3.6
Spigot 1.21
ProtocolLib #721
During winter seasons when it snows there's no snow accumulation except in biomes/heights which naturally get snow.
Sno…
-
There is this quote:
```
**Gradient accumulation** simulates a larger batch size than the
--
252 | hardware can support and therefore does not provide any throughput
253 | benefits. It should g…
-
Hi,
I fine-tuned the LLaMA-3-8b-Instruct on "llama3-ultrafeedback-armorm" in different gradient accumulation (other hyperparameters are the same with llama-3-8b-instruct-simpo-v2.yaml). For fine-t…
-
**Describe the bug**
I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma …
-
### ❓ The question
Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the or…