-
Thanks for your good job。
I am trying to fine-tune the videollama2 model with my own data. However, after fine-tuning, the model starts to repeatedly output the same content. Could you help me solv…
-
**Describe the bug**
The model response doesn't stop. It keeps writing. I tried both `swift deploy` and `vllm`
Training arguments:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 \
USE_HF=1 \
CUDA_VISIBLE…
-
Here is the error:
File "/home/workspace/x-flux-main/src/flux/modules/layers.py", line 499, in __call__
output = attn.linear2(torch.cat((attn_1, attn.mlp_act(mlp)), 2))
torch.OutOfMemoryError…
-
### System Info
```Shell
- `Accelerate` version: 0.33.0
- Platform: Linux-5.15.133+-x86_64-with-glibc2.35
- `accelerate` bash location: /opt/conda/bin/accelerate
- Python version: 3.10.14
- Numpy…
-
### Describe the feature
currently according to Gemini's official desc, we cannot do gradient accumulation manually, hope Colossal AI team can add this feature to the projest.
-
**Describe the bug**
![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140)
**Additional context**
SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …
-
In gradient accumulation, we do not need to gather the gradient for the first N - 1 iterations. If it is pytorch/DDP, we can use the no_sync() as follows. In apex DDP, is there any equivalent?
http…
-
### 🚀 The feature
It would be nice if gradient accumulation functionality could be added to the HuBERT recipe.
### Motivation, pitch
Using gradient accumulation can simulate a larger cluster / larg…
-
![W B Chart 3_27_2024, 9 55 33 AM](https://github.com/karpathy/nanoGPT/assets/153394752/400c926a-0443-4faa-b114-6a567420a988)
I am running on 2x 4090 , updated gpu to 2 instead of 8 in gradient_accum…
-
When I finetune llama7b:
```
# alpaca
torchrun --nproc_per_node=8 --master_port=29000 train.py \
--model_name_or_path .cache/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d…