两块4090显卡跑baichuan2 -13b-chat 报错

一块4090跑demo可以跑通，两块显卡报如下错误:

import torch = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-13B-Chat") messages = [] messages.append({"role": "user", "content": "解释一下“温故而知新”"}) response = model.chat(tokenizer, messages) print(response) from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", use_fast=False, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True) Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00, 4.77s/it] model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-13B-Chat") messages = [] messages.append({"role": "user", "content": "解释一下“温故而知新”"}) response = model.chat(tokenizer, messages) Traceback (most recent call last): File "", line 1, in File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 825, in chat outputs = self.generate(input_ids, generation_config=generation_config) File "/root/anaconda3/envs/baichuan-13b/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/anaconda3/envs/baichuan-13b/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate return self.sample( File "/root/anaconda3/envs/baichuan-13b/lib/python3.10/site-packages/transformers/generation/utils.py", line 2897, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0 print(response) Traceback (most recent call last): File "", line 1, in NameError: name 'response' is not defined

baichuan-inc / Baichuan2

两块4090显卡跑baichuan2 -13b-chat 报错 #332