OpenBMB / BMInf

Efficient Inference for Big Models
Apache License 2.0
572 stars 67 forks source link

[BUG]请问BMInf支持transformers的模型吗?我用BMInf包装模型推理时报错了 #61

Closed L-hongbin closed 1 year ago

L-hongbin commented 1 year ago

模型代码:

self.model = MyBert.from_pretrained(pretrained_model_name_or_path=model_path,)
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
self.model.to(self.device)
self.model = bminf.wrapper(self.model)

错误信息:

input_embed = self.model.bert(**input_tokenized)["last_hidden_state"]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 1022, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 611, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 497, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 427, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 293, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 81, in forward
    out = OpLinear.apply(x, self.weight_quant, self.weight_scale)
  File "/usr/local/lib/python3.8/dist-packages/bminf/quantization/__init__.py", line 31, in forward
    gemm_int8(
  File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/gemm.py", line 139, in gemm_int8
    assert m % 4 == 0 and n % 4 == 0 and k % 4 == 0
AssertionError
a710128 commented 1 year ago
self.model = bminf.wrapper(self.model, quantization=False)

试下把量化关掉,因为量化之后要求输入的shape是4的倍数