THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
39.96k stars 5.15k forks source link

[Help] ptu微调之后,模型加载要5,6分钟有什么办法可以提升速度呢 #1460

Open 1042312930 opened 4 months ago

1042312930 commented 4 months ago

Is there an existing issue for this?

Current Behavior

加载代码如下 tokenizer = AutoTokenizer.from_pretrained("THUDM\chatglm-6b", trust_remote_code=True) two_model = AutoModel.from_pretrained("THUDM\chatglm-6b", trust_remote_code=True) two_prefix_state_dict = torch.load(os.path.join("ptuning/output/adgen-pt-128-2e-2-405sql_train/checkpoint-500", "pytorch_model.bin"))

two_new_prefix_state_dict = {} for k, v in two_prefix_state_dict.items(): if k.startswith("transformer.prefix_encoder."): two_new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v two_model.transformer.prefix_encoder.load_state_dict(two_new_prefix_state_dict)

Comment out the following line if you don't use quantization

two_model = two_model.quantize(4) two_model = two_model.half().cuda() two_model.transformer.prefix_encoder.float() two_model = two_model.eval()

Expected Behavior

No response

Steps To Reproduce

训练完成后加载模型(训练的权重也加上)

Environment

- OS:
- Python:3.9
- CUDA Support
- 内存 16G
- GPU 12G

Anything else?

No response