在摩尔线程s80上，如何使用transformers库来运行大模型？

zhutao1221 commented 4 months ago

lms-mt commented 4 months ago

按照教程，torch_musa是可以在S80上跑起来transformers的。

liugangdao commented 4 months ago

教程在哪

lms-mt commented 4 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法，指定device为musa即可，已Gemma为例：

device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

zhutao1221 commented 3 months ago

需要先import torch_musa吗？还是transofmer库已经添加了对musa的支持？

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法，指定device为musa即可，已Gemma为例：
device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

lms-mt commented 2 months ago

需要先import torch_musa吗？还是transofmer库已经添加了对musa的支持？
口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法，指定device为musa即可，已Gemma为例：
device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

要先Import

luowa-technology commented 2 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法，指定device为musa即可，已Gemma为例：
device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)

按照这个方法做会有如下错误： gemma-2b: /home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py:1141: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( Traceback (most recent call last): File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 65, in main() File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 59, in main outputs = model.generate(inputs_ids) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 1118, in forward outputs = self.model( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 926, in forward layer_outputs = decoder_layer( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 646, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 268, in forward cos, sin = self.rotary_emb(value_states, position_ids, seq_len=None) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward with torch.autocast(device_type=device_type, enabled=False): File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in init raise RuntimeError('User specified autocast device_type must be \'cuda\' or \'cpu\'') RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

我的运行环境是Ubuntu20.04, MT s80, python3.10.13, mthreads-gmi:1.9.0 Driver Version:2.6.0

foreverlms commented 2 months ago

教程在哪

口误。教程还没出来。不过就按常规的transofmer库里的Pipeline用法，指定device为musa即可，已Gemma为例：
device = "musa"
data_type  = torch.half
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained(
    gemma_2b_path, device_map=device, torch_dtype=data_type)
按照这个方法做会有如下错误： gemma-2b: /home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py:1141: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( Traceback (most recent call last): File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 65, in main() File "/home/jorden/Softwares/ml/llm/ChatGLM3/basic_demo/test-gemma.py", line 59, in main outputs = model.generate(inputs_ids) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 1118, in forward outputs = self.model( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 926, in forward layer_outputs = decoder_layer( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 646, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 268, in forward cos, sin = self.rotary_emb(value_states, position_ids, seq_len=None) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward with torch.autocast(device_type=device_type, enabled=False): File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 201, in init** raise RuntimeError('User specified autocast device_type must be 'cuda' or 'cpu'') RuntimeError: User specified autocast device_type must be 'cuda' or 'cpu'

我的运行环境是Ubuntu20.04, MT s80, python3.10.13, mthreads-gmi:1.9.0 Driver Version:2.6.0

由于gemma里最新的huggingface使用了amp，请在上面 File "/home/jorden/Softwares/anaconda3/envs/mt/lib/python3.10/site-packages/transformers/models/gemma/modeling_gemma.py", line 122, in forward中替换为：

with torch.musa.amp.autocast

试试

MooreThreads / torch_musa

在摩尔线程s80上，如何使用transformers库来运行大模型？ #26