[Question] 多GPU部署Baichuan-7B方法

Required prerequisites

[ ] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
[ ] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[ ] Consider asking first in a Discussion.

Questions

支持Baichuan-7B 原生部署、 int8 和 int4 量化部署，代码如下：

import os import platform import torch from transformers import AutoTokenizer, AutoModelForCausalLM

def auto_configure_device_map(num_gpus: int): num_trans_layers = 32 per_gpu_layers = num_trans_layers / num_gpus device_map = {'model.embed_tokens': 0, 'model.norm': num_gpus-1, 'lm_head': num_gpus-1} for i in range(num_trans_layers): device_map[f'model.layers.{i}'] = int(i//per_gpu_layers)

return device_map

MODEL_NAME = "baichuan-inc/baichuan-7B"

NUM_GPUS = torch.cuda.device_count() if torch.cuda.is_available() else None MAX_TOKENS = 512 device_map = auto_configure_device_map(NUM_GPUS) if NUM_GPUS > 0 else None device = torch.device("cuda") if NUM_GPUS > 0 else torch.device("cpu") device_dtype = torch.half if NUM_GPUS > 0 else torch.float

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True).quantize(8) # 当前是int8量化；需要int4量化，只需将8改为4即可；需要原生部署，去掉.quantize(8)即可 model = dispatch_model(model, device_map=device_map) model = model.eval()

感谢 https://github.com/baichuan-inc/Baichuan-7B/issues/50 中小伙伴们提供的宝贵方法。

Checklist

[X] I have provided all relevant and necessary information above.
[X] I have chosen a suitable title for this issue.

baichuan-inc / Baichuan-7B