baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] 多GPU部署Baichuan-7B方法 #125

Closed potong closed 1 year ago

potong commented 1 year ago

Required prerequisites

Questions

支持Baichuan-7B 原生部署、 int8 和 int4 量化部署,代码如下:

import os import platform import torch from transformers import AutoTokenizer, AutoModelForCausalLM

def auto_configure_device_map(num_gpus: int): num_trans_layers = 32 per_gpu_layers = num_trans_layers / num_gpus device_map = {'model.embed_tokens': 0, 'model.norm': num_gpus-1, 'lm_head': num_gpus-1} for i in range(num_trans_layers): device_map[f'model.layers.{i}'] = int(i//per_gpu_layers)

return device_map

MODEL_NAME = "baichuan-inc/baichuan-7B"

NUM_GPUS = torch.cuda.device_count() if torch.cuda.is_available() else None MAX_TOKENS = 512 device_map = auto_configure_device_map(NUM_GPUS) if NUM_GPUS > 0 else None device = torch.device("cuda") if NUM_GPUS > 0 else torch.device("cpu") device_dtype = torch.half if NUM_GPUS > 0 else torch.float

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True).quantize(8) # 当前是int8量化;需要int4量化,只需将8改为4即可;需要原生部署,去掉.quantize(8)即可 model = dispatch_model(model, device_map=device_map) model = model.eval()

感谢 https://github.com/baichuan-inc/Baichuan-7B/issues/50 中小伙伴们提供的宝贵方法。

Checklist