MooreThreads / torch_musa

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.
Other
276 stars 15 forks source link

在摩尔线程s80上,如何使用transformers库来运行大模型? #26

Open zhutao1221 opened 4 months ago

zhutao1221 commented 4 months ago

在摩尔线程s80上,如何使用transformers库来运行大模型?

lms-mt commented 4 months ago

按照教程,torch_musa是可以在S80上跑起来transformers的。

liugangdao commented 4 months ago

教程在哪

lms-mt commented 4 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法,指定device为musa即可,已Gemma为例:

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)
zhutao1221 commented 3 months ago

需要先import torch_musa吗?还是transofmer库已经添加了对musa的支持?

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法,指定device为musa即可,已Gemma为例:

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)
lms-mt commented 2 months ago

需要先import torch_musa吗?还是transofmer库已经添加了对musa的支持?

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法,指定device为musa即可,已Gemma为例:

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

要先Import

luowa-technology commented 2 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法,指定device为musa即可,已Gemma为例:

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

按照这个方法做会有如下错误: gemma-2b: /home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py:1141: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( Traceback (most recent call last): File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 65, in main() File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 59, in main outputs = model.generate(inputs_ids) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 1118, in forward outputs = self.model( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 926, in forward layer_outputs = decoder_layer( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 646, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 268, in forward cos, sin = self.rotary_emb(value_states, position_ids, seq_len=None) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward with torch.autocast(device_type=device_type, enabled=False): File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in init raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'') RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

我的运行环境是Ubuntu20.04, MT s80, python3.10.13, mthreads-gmi:1.9.0 Driver Version:2.6.0

foreverlms commented 2 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法,指定device为musa即可,已Gemma为例:

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

按照这个方法做会有如下错误: gemma-2b: /home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py:1141: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( Traceback (most recent call last): File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 65, in main() File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 59, in main outputs = model.generate(inputs_ids) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 1118, in forward outputs = self.model( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 926, in forward layer_outputs = decoder_layer( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 646, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 268, in forward cos, sin = self.rotary_emb(value_states, position_ids, seq_len=None) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward with torch.autocast(device_type=device_type, enabled=False): File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in init** raise RuntimeError('User specified autocast device_type must be 'cuda' or 'cpu'') RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

我的运行环境是Ubuntu20.04, MT s80, python3.10.13, mthreads-gmi:1.9.0 Driver Version:2.6.0

由于gemma里最新的huggingface使用了amp,请在上面 File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward中替换为:

with torch.musa.amp.autocast

试试