-
-
When i train diving-48 datasets with clip_base,i change videos_per_gps=4 because my machine limit(v100 32g). And i use 14 gpus to train a model so the batchsize is 56,close to 64.But finally i got top…
-
硬件环境:`RTX A5000(24GB) * 5`
内存:`210GB`
模型:`moss-moon-003-base`
训练报错,提示:
```bash
OutOfMemoryError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 23.69 GiB total capacity; 17.46 GiB a…
-
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from …
-
I've been using an older dfdx with cudarc 0.8.0 which has worked fine, and I recently upgraded to the latest version on github. I'm getting OOM errors, notably after many iterations, so I believe it's…
-
### Description
We need to train stt-wav2vec2 model on the new datasets that we have gained also because of the new departments data introduced.
### Completion Criteria
Stt wav2vec2 model with better…
-
Hello,
So I finetuned voicecraft on the french common voice-french dataset. It's quite exciting since it's my first time working on LLM and on full audio model (not just spectrogram -> classificat…
-
### 🐛 Describe the bug
I previously attempted to submit a similar issue on #3383, but some of my imprecise expressions may cause unnecessary misunderstandings, which could increase the cost of unders…
-
hi, i notice that when finetuning ssp model on vcr task, the performance drop a lot at each 5000 steps in the first epoch.
before finuetuning, the result for Q2A and QA2R are both more than 74%
step…
-
### Describe the bug
On default settings provided in flux train example readme, with 10 validation images training will error out with out of memory error during validation. on A100 80GB
```
…