alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
http://www.mnn.zone/
8.75k stars 1.67k forks source link

Failed to export phi-2 with llm_export.py #2962

Closed WenguoLi closed 2 months ago

WenguoLi commented 4 months ago

MNN Version MNN-master

pip list


accelerate 0.32.1 aiohttp 3.9.5 aiosignal 1.3.1 async-timeout 4.0.3 attrs 23.2.0 certifi 2024.7.4 charset-normalizer 3.3.2 coloredlogs 15.0.1 datasets 2.20.0 dill 0.3.8 einops 0.8.0 evaluate 0.4.2 filelock 3.15.4 flatbuffers 24.3.25 frozenlist 1.4.1 fsspec 2024.5.0 huggingface-hub 0.23.4 humanfriendly 10.0 idna 3.7 Jinja2 3.1.4 MarkupSafe 2.1.5 MNN 2.8.3 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 onnx 1.16.1 onnxruntime 1.18.1 onnxslim 0.1.32 optimum 1.21.2 packaging 24.1 pandas 2.2.2 peft 0.11.1 pip 24.0 protobuf 5.27.2 psutil 6.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2024.5.15 requests 2.32.3 safetensors 0.4.3 sentencepiece 0.2.0 setuptools 69.5.1 six 1.16.0 sympy 1.13.0 tokenizers 0.19.1 torch 2.3.1 tqdm 4.66.4 transformers 4.42.4 triton 2.3.1 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 wheel 0.43.0 xxhash 3.4.1 yarl 1.9.4

phi-2 model https://modelscope.cn/models/AI-ModelScope/phi-2

Command:

python llm_export.py \
        --type phi-2 \
        --path /home/charles/wenguo.li/mnn/models/llm/phi-2 \
        --export \
        --export_token \
        --export_embed --embed_bf16 --embed_bin \
        --export_mnn

Error: The device support i8sdot:0, support fp16:0, support i8mm: 0 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.53it/s] export start ... Traceback (most recent call last): File "/home/charles/wenguo.li/mnn/MNN/transformers/llm/export/llm_export.py", line 1414, in llm_exporter.export() File "/home/charles/wenguo.li/mnn/MNN/transformers/llm/export/llm_export.py", line 378, in export torch.onnx.export( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export _export( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/onnx/utils.py", line 1612, in _export graph, params_dict, torch_out = _model_to_graph( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/onnx/utils.py", line 1134, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/onnx/utils.py", line 1010, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/onnx/utils.py", line 914, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/jit/_trace.py", line 1310, in _get_trace_graph outs = ONNXTracedModule( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper outs.append(self.inner(trace_inputs)) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) File "/home/charles/wenguo.li/mnn/MNN/transformers/llm/export/llm_export.py", line 175, in forward return self.decode(input_ids, attention_mask, position_ids, past_key_values) File "/home/charles/wenguo.li/mnn/MNN/transformers/llm/export/llm_export.py", line 165, in decode hidden_states, kv = self.blocks[i](hidden_states, attention_mask, position_ids, past_key_values[i]) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) File "/home/charles/wenguo.li/mnn/MNN/transformers/llm/export/llm_export.py", line 1127, in forward hidden_states, presents = self.block(hidden_states, File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(*input, kwargs) File "/home/charles/.cache/huggingface/modules/transformers_modules/phi-2-pri/modeling_phi.py", line 798, in forward attn_outputs, kv = self.mixer( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1522, in _slow_forward result = self.forward(input, kwargs) File "/home/charles/.cache/huggingface/modules/transformers_modules/phi-2-pri/modeling_phi.py", line 751, in forward attn_output, kv = self._forward_cross_attn(x, past_key_values, attention_mask, rotary_pos_emb, causal_mask) File "/home/charles/.cache/huggingface/modules/transformers_modules/phi-2-pri/modeling_phi.py", line 643, in _forward_cross_attn q = rearrange(q, "... (h d) -> ... h d", d=self.head_dim) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/einops.py", line 591, in rearrange return reduce(tensor, pattern, reduction="rearrange", axes_lengths) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/einops.py", line 518, in reduce backend = get_backend(tensor) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/_backends.py", line 53, in get_backend backend = BackendSubclass() File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/_backends.py", line 221, in init from . import _torch_specific # noqa File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/_torch_specific.py", line 128, in allow_ops_in_compiled_graph() File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/einops/_torch_specific.py", line 107, in allow_ops_in_compiled_graph from torch._dynamo import allow_in_graph File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/init.py", line 64, in torch.manual_seed = disable(torch.manual_seed) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/decorators.py", line 50, in disable return DisableContext()(fn) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 410, in call (filename is None or trace_rules.check(fn)) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 3378, in check return check_verbose(obj, is_inlined_call).skipped File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 3361, in check_verbose rule = torch._dynamo.trace_rules.lookup_inner( File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 3442, in lookup_inner rule = get_torch_obj_rule_map().get(obj, None) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 2782, in get_torch_obj_rule_map obj = load_object(k) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 2811, in load_object val = _load_obj_from_str(x[0]) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/_dynamo/trace_rules.py", line 2795, in _load_obj_from_str return getattr(importlib.import_module(module), obj_name) File "/home/charles/anaconda3/envs/aic/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nested/_internal/nested_tensor.py", line 419, in ).detach() File "/home/charles/anaconda3/envs/aic/lib/python3.10/site-packages/torch/nested/_internal/nested_tensor.py", line 232, in __torch_function__ return func(args, **kwargs) RuntimeError: Unsupported value kind: Tensor

github-actions[bot] commented 2 months ago

Marking as stale. No activity in 60 days.