-
聊天
续写
作曲
配置
模型
下载
训练
设置
关于
LoRA微调
WSL
8 768 blocks.3.ffn.key.lora_A
3072 8 blocks.3.ffn.key.lora_B
768 768 blocks.3.ffn.receptance.weight
8 768 blocks.3.ffn.recep…
-
### 🐛 Describe the bug
Creating an TransformerEncoder causes memory overflow, but the same config works with the huggingface `transformers` module.
```python
# config.py
from colossalai.amp import…
-
## 🚀 Description
Pipeline parallelism is a technique used in deep learning model training to improve efficiency and reduce the training time of large neural networks. Here we propose a pipeline paral…
-
Some people (even seasoned developers) get thrown off by our documentation, let's try to fix that!
- [ ] Add descriptions of what each funcion does, aside from linking the man page
- [ ] Add an examp…
-
### Description & Motivation
When training different model sizes on a different number of devices or different hardware, the batch size needs to be carefully tuned in order to achieve maximum GPU u…
-
在cpu+fp32推理时遇到下面的报错:
> File "D:\gitpro\RWKV-LM-LoRA\RWKV-v4neo\src\model_run.py", line 67, in __init_
> w[k] += w[lora_B] @ w[lora_A] * (args.lora_alpha / args.lora_r)
> RuntimeError: "…
-
After running the toy example I run it again to resume training and I'm getting an error only if PP > 1
Here's the config:
```yaml
checkpoints:
checkpoint_interval: 25
checkpoints_path: …
-
I changed batch size to this:
```
# batch_size = 128
batch_size = 6
micro_batch_size = 2
gradient_accumulation_steps = batch_size // micro_batch_size
max_iters = 50000 * 3 // micro_batch_size
…
-
checkpoint = (j < checkpoint_stop)
if checkpoint:
chk = Checkpointing(partition, batch)
task = Task(streams[i], compute=chk.chec…
-
# Introduction
As machine learning models continue to grow in size (ex: OpenAI GPT-2 with 1.5B parameters, OpenAI GPT-3 with 175B parameters), traditional [Distributed DataParallel](https://pytorch…