gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

zejunwang1/bloom_tuning #5

你好，请问怎么启用 offload_optimizer

youzihaha updated 1 year ago
2
taoyang1122/adapt-image-models #24

something questions about hyper parameters and experiment re…

When i train diving-48 datasets with clip_base,i change videos_per_gps=4 because my machine limit(v100 32g). And i use 14 gpus to train a model so the batchsize is 56,close to 64.But finally i got top…

007invictus updated 1 year ago
1
OpenMOSS/MOSS #262

OutOfMemoryError: CUDA out of memory.

硬件环境：`RTX A5000(24GB) * 5` 内存：`210GB` 模型：`moss-moon-003-base` 训练报错，提示： ```bash OutOfMemoryError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 23.69 GiB total capacity; 17.46 GiB a…

sk142857 updated 1 year ago
6
yxli2123/LoftQ #27

Error with shape

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from …

manlenzzz updated 6 months ago
2
coreylowman/dfdx #643

Memory Leakage in cudarc 0.9.x / dfdx 0.11.x

I've been using an older dfdx with cudarc 0.8.0 which has worked fine, and I recently upgraded to the latest version on github. I'm getting OOM errors, notably after many iterations, so I believe it's…

jafioti updated 1 year ago
16
OpenPecha/stt-wav2vec2 #5

STT0043: Training STT wav2vec model on GCP.

### Description We need to train stt-wav2vec2 model on the new datasets that we have gained also because of the new departments data introduced. ### Completion Criteria Stt wav2vec2 model with better…

gangagyatso4364 updated 3 weeks ago
9
jasonppy/VoiceCraft #154

I finetuned voicecraft on commonvoice-french, here are some …

Hello, So I finetuned voicecraft on the french common voice-french dataset. It's quite exciting since it's my first time working on LLM and on full audio model (not just spectrogram -> classificat…

zmy1116 updated 2 months ago
1
hpcaitech/ColossalAI #3403

[BUG]: Cannot use pipeline and gemini at the same time

### 🐛 Describe the bug I previously attempted to submit a similar issue on #3383, but some of my imprecise expressions may cause unnecessary misunderstandings, which could increase the cost of unders…

liuzeming-yuxi updated 1 year ago
1
thunlp/PEVL #11

training on vcr task

hi, i notice that when finetuning ssp model on vcr task, the performance drop a lot at each 5000 steps in the first epoch. before finuetuning, the result for Q2A and QA2R are both more than 74% step…

huangsiyong updated 1 year ago
4
huggingface/diffusers #9546

Flux Controlnet Train Example, will run out of memory on val…

### Describe the bug On default settings provided in flux train example readme, with 10 validation images training will error out with out of memory error during validation. on A100 80GB ``` …

Night1099 updated 3 weeks ago
14

上一页 1...76 77 78 79 80 81 82...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation