gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenRLHF/OpenRLHF #452

questions on the training configuration

Thanks for the great work! I have some questions about the training configuration. For the training batch size, I assume that we will collect rollout_batch_size = 1024 trajectories into the repl…

WayXG updated 5 days ago
1
FluxML/Zygote.jl #905

Incremental accumulation of gradients?

From discourse https://discourse.julialang.org/t/zygote-gradient-accumulation/55654 I have a densenet inspired architecture implemented in pytorch and ported it to julia. Sadly I get out of memory …

jw3126 updated 2 years ago
20
zhoubenjia/GFSLT-VLP #19

Couldnt reproduce results gotten in the paper using the same…

Hello GFSLT-VLP, Thank you for sharing your work. I tried reproducing the results as reported in your paper, specifically by using the VLP Pretrain V2 command and the GFSLT-VLP command on a single …

hemhemoh updated 1 week ago
4
huggingface/transformers #25695

tensor size mismatch with larger gradient_accumulation_steps…

### System Info A100 Nvidia 80G GPU ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task…

yyymeta updated 1 week ago
6
Lightning-AI/pytorch-lightning #19987

Handle gradient accumulations at the end of epoch differentl…

### Bug description At the end of an epoch with accumulate_grad_batches>1 the dataloader may run out of data before the required number of accumulations. The lightning docs do not say what happens. I…

jakub-dyno updated 3 months ago
1
unslothai/unsloth #973

Unexpected train_batch_size in saved checkpoint file, causin…

In my training script, I set the **per_device_train_batch_size = 4** in the TrainingArguments. But the **train_batch_size** in the **trainer_state.json** of each checkpoint is **2**. When I tried …

Decentblast updated 1 month ago
3
huggingface/diffusers #8296

[Training] Resume checkpoint global step inconsistent/confus…

### Describe the bug Hi, I have been working on training scripts for multiple models (T2I, IP2P) and found the different logic to calculate `step` and `epoch` while resuming training different acr…

vinm007 updated 3 weeks ago
3
Lareina2441/LLaVA-Med #2

作者又在自言自语

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \ llava/train/train_mem.py \ --model_name_or_path /path/to/checkpoint_llava_med \ --data_path /path/to/your_dental_dataset.jso…

Lareina2441 updated 5 days ago
3
explainingai-code/FasterRCNN-PyTorch #3

list indices must be integers or slices, not str

Hello, I see your code in the internet, and this is so interesting. I cloned it and used for my project. I try changing batch_size from 1 to 4 and backbone from vgg16 to resnet101. But I have a pr…

huutuan1705 updated 13 hours ago
2
hpcaitech/ColossalAI #4776

does hybridparallelplugin support gradient accumulation?

### 🐛 Describe the bug I want to Continue pretrain llama-7b with only 8 A100-80G. And I want to set global batchsize to 1024. But I can't find gradient accumulation setting. ### Environment _…

YixinSong-e updated 1 year ago
1

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation