Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 425 forks source link

RuntimeError: mat1 and mat2 shapes cannot be multiplied (164x4096 and 1x8388608) #228

Open adaaaaaa opened 1 year ago

adaaaaaa commented 1 year ago

python generate_4bit.py --model_path decapoda-research/llama-7b-hf --lora_path Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco --use_local 0 上面的命令报错了。。。

/home/nano/.local/lib/python3.10/site-packages/gradio/inputs.py:27: UserWarning: Usage of gradio.inputs is deprecated, and will not be supported in the future, please import your component from gradio.components warnings.warn( /home/nano/.local/lib/python3.10/site-packages/gradio/inputs.py:30: UserWarning: optional parameter is deprecated, and it has no effect super().init( /home/nano/.local/lib/python3.10/site-packages/gradio/inputs.py:30: UserWarning: numeric parameter is deprecated, and it has no effect super().init( Running on local URL: http://127.0.0.1:7860 Traceback (most recent call last): File "/home/nano/.local/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict output = await app.get_blocks().process_api( File "/home/nano/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1346, in process_api result = await self.call_function( File "/home/nano/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1090, in call_function prediction = await utils.async_iteration(iterator) File "/home/nano/.local/lib/python3.10/site-packages/gradio/utils.py", line 341, in async_iteration return await iterator.anext() File "/home/nano/.local/lib/python3.10/site-packages/gradio/interface.py", line 633, in fn async for output in iterator: File "/home/nano/.local/lib/python3.10/site-packages/gradio/utils.py", line 334, in anext return await anyio.to_thread.run_sync( File "/home/nano/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/nano/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/nano/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, args) File "/home/nano/.local/lib/python3.10/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async return next(iterator) File "/data/Chinese-Vicuna/generate_4bit.py", line 152, in evaluate for generation_output in model.stream_generate( File "/data/Chinese-Vicuna/utils.py", line 657, in stream_beam_search outputs = self( File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/nano/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward outputs = self.model( File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/nano/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward layer_outputs = decoder_layer( File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/nano/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/nano/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/nano/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/nano/.local/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (164x4096 and 1x8388608)

LGAG commented 1 year ago

我也遇到了这样的问题,但是是在用finetune_4bit.py的时候出现的

Facico commented 1 year ago

把依赖对齐:https://github.com/Facico/Chinese-Vicuna/blob/master/requirements_4bit.txt

18065013 commented 1 year ago

我在双卡3090正常finetune非4bit下也存在该问题,不过我两张3090非同一厂商的,求解