accumulation Search Results

1000+ results
for accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/transformers #31677

Mismatch with epoch when using gradient_accumulation

### System Info - `transformers` version: 4.43.0.dev0 - Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.35 - Python version: 3.10.14 - Huggingface_hub version: 0.23.4 - Safetensors version: …

SangbumChoi updated 18 hours ago
6
OpenGVLab/Ask-Anything #174

Using Gradient Accumulation

Hello Thanks for your great work. I need to use gradient accumulation on batches due to RAM constraints. The training loop involves iterating over two modalities. I am concerned about the implicatio…

bexxnaz updated 3 months ago
3
huggingface/nanotron #209

multi-node pp hang when enable gradient accumulation

I tested llama3 continue training with multi-machine tp4 pp2 dp2. If I enabled grad accum operation, the training would hang. The experimental environment is: 16H800 torch 2.1.2+cu121. checkpoints:…

yuuxiaooqingg updated 2 weeks ago
4
containerd/containerd #8381

ContainerD shim process accumulation

### Description We're currently experiencing an intermittent issue in our Kubernetes v1.25.7 Kops cluster. Overtime, containerd accumulates `containerd-shim-runc-v2` processes until PID exhaustion o…

Izzette updated 1 month ago
1
patrick-kidger/equinox #772

Dropout causes `None`s with gradient accumulation

Dropout is really the bane of equinox it seems. Loose follow-up of #681 - effectively, I'm trying to fix this problem that cropped up a while ago when using `optax.MultiSteps` for gradient accumulatio…

neel04 updated 1 week ago
4
AdvancedPlugins/Seasons #121

No snow accumulation during winter

### Describe the bug AdvancedSeasons 1.3.6 Spigot 1.21 ProtocolLib #721 During winter seasons when it snows there's no snow accumulation except in biomes/heights which naturally get snow. Sno…

Pilvinen updated 2 months ago
1
google-research/tuning_playbook #69

Why avoid gradient accumulation?

There is this quote: ``` **Gradient accumulation** simulates a larger batch size than the -- 252 | hardware can support and therefore does not provide any throughput 253 | benefits. It should g…

RonanKMcGovern updated 3 months ago
5
princeton-nlp/SimPO #61

Difference with changing the gradient accumulation - ZeroEva…

Hi, I fine-tuned the LLaMA-3-8b-Instruct on "llama3-ultrafeedback-armorm" in different gradient accumulation (other hyperparameters are the same with llama-3-8b-instruct-simpo-v2.yaml). For fine-t…

sahsaeedi updated 2 weeks ago
4
microsoft/DeepSpeed #5898

[BUG] Gradient accumulation causing training loss difference…

**Describe the bug** I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma …

gramesh-amd updated 4 days ago
2
allenai/OLMo #672

Does global_train_batch_size support gradient accumulation?

### ❓ The question Hello authors, thank you very much for your inspiring work. I now have 8 A100s. If I want to pretrain the model at a certain checkpoint, can I set global_train_batch_size to the or…

jinzhuoran updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for accumulation

1000+ results
for accumulation