charent / ChatLM-mini-Chinese

中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
Apache License 2.0
1.12k stars 132 forks source link

4080显卡,基本跑不了多少数据,过万条训练数据就报错 #54

Open iissy opened 1 month ago

iissy commented 1 month ago

我已经把配置文件改小了: `class T5ModelConfig:

d_ff: int = 1024                        # 全连接层维度

d_model: int = 512                      # 词向量维度
num_heads: int = 8                     # 注意力头数 d_model // num_heads == d_kv
d_kv: int = 64                          # d_model // num_heads

num_decoder_layers: int = 6            # Transformer decoder 隐藏层层数
num_layers: int = 6                    # Transformer encoder 隐藏层层数

`

词汇表也只10000,百度百科百万级别数据,我只能取几千条跑,多了就报错。

电脑配置: 显卡:4080(12G显存) 内存:32G, cpu:i9-14900HX(24核,32线程)

这配置不配训练大模型吗?

iissy commented 1 month ago

raceback (most recent call last): File "I:\ChatLM-mini-Chinese\pre_train.py", line 140, in pre_train(config) File "I:\ChatLM-mini-Chinese\pre_train.py", line 123, in pre_train trainer.train( File "C:\Users\pinbo.conda\envs\chat\lib\site-packages\transformers\trainer.py", line 1932, in train return inner_training_loop( File "C:\Users\pinbo.conda\envs\chat\lib\site-packages\accelerate\utils\memory.py", line 142, in decorator raise RuntimeError("No executable batch size found, reached zero.") RuntimeError: No executable batch size found, reached zero. 0%| | 8/1281856 [00:16<724:58:07, 2.04s/it]

iissy commented 1 month ago

文中说的16G内存,4G显存,有真实成功跑过吗?求解

staxd commented 1 month ago

文中说的16G内存,4G显存,有真实成功跑过吗?求解

3000条,24G显存直接拉满

iissy commented 1 month ago

文中说的16G内存,4G显存,有真实成功跑过吗?求解

3000条,24G显存直接拉满

看了文档,用train.py训练,修改batch_size_per_gpu为1,对内存占用的确很少了。