gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

DAMO-NLP-SG/VideoLLaMA2 #89

After fine-tuning, the model outputs repetitive phrases

Thanks for your good job。 I am trying to fine-tune the videollama2 model with my own data. However, after fine-tuning, the model starts to repeatedly output the same content. Could you help me solv…

Jackyzjz updated 2 weeks ago
2
modelscope/ms-swift #2184

llama-3.2-3b instruct doesn't stop writing

**Describe the bug** The model response doesn't stop. It keeps writing. I tried both `swift deploy` and `vllm` Training arguments: ```bash HF_HUB_ENABLE_HF_TRANSFER=1 \ USE_HF=1 \ CUDA_VISIBLE…

Aunali321 updated 3 days ago
5
XLabs-AI/x-flux #108

(out of memory)OOM for traing controlnet on flux-dev-fp8 on …

Here is the error: File "/home/workspace/x-flux-main/src/flux/modules/layers.py", line 499, in __call__ output = attn.linear2(torch.cat((attn_1, attn.mlp_act(mlp)), 2)) torch.OutOfMemoryError…

kongzijian updated 3 weeks ago
6
huggingface/accelerate #2979

`RuntimeError: Expected all tensors to be on the same device…

### System Info ```Shell - `Accelerate` version: 0.33.0 - Platform: Linux-5.15.133+-x86_64-with-glibc2.35 - `accelerate` bash location: /opt/conda/bin/accelerate - Python version: 3.10.14 - Numpy…

echo-yi updated 3 weeks ago
4
hpcaitech/ColossalAI #3375

[FEATURE]: Gradient accumulation with Gemini DDP

### Describe the feature currently according to Gemini's official desc, we cannot do gradient accumulation manually, hope Colossal AI team can add this feature to the projest.

zixiliuUSC updated 1 year ago
1
modelscope/ms-swift #2108

qwen2 audio dpo微调报错

**Describe the bug** ![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140) **Additional context** SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …

zhangfan-algo updated 1 week ago
1
NVIDIA/apex #1175

no_sync equivalent used for gradient accumulation

In gradient accumulation, we do not need to gather the gradient for the first N - 1 iterations. If it is pytorch/DDP, we can use the no_sync() as follows. In apex DDP, is there any equivalent? http…

amsword updated 8 months ago
2
pytorch/audio #2918

Enable gradient accumulation for HuBERT recipe

### 🚀 The feature It would be nice if gradient accumulation functionality could be added to the HuBERT recipe. ### Motivation, pitch Using gradient accumulation can simulate a larger cluster / larg…

arlofaria updated 1 year ago
5
karpathy/nanoGPT #468

Is this loss curve normal

![W B Chart 3_27_2024, 9 55 33 AM](https://github.com/karpathy/nanoGPT/assets/153394752/400c926a-0443-4faa-b114-6a567420a988) I am running on 2x 4090 , updated gpu to 2 instead of 8 in gradient_accum…

banyan-god updated 3 weeks ago
20
tatsu-lab/stanford_alpaca #319

ValueError: Trying to set a tensor of shape torch.Size([327…

When I finetune llama7b: ``` # alpaca torchrun --nproc_per_node=8 --master_port=29000 train.py \ --model_name_or_path .cache/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d…

daidaiershidi updated 1 day ago
1

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation