LlamaFamily / Llama-Chinese

Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
https://llama.family
12.96k stars 1.18k forks source link

尝试运行Llama2-Chinese-7b-Chat报错 #153

Open fenfenyangyangmate opened 10 months ago

fenfenyangyangmate commented 10 months ago

今天解决了python -m bitsandbytes的问题,随之而来的就是新报错:

PS F:\新建文件夹> python .\Llama2-Chinese\examples\chat_gradio.py --model_name_or_path .\Llama2-Chinese-7b-Chat\ bin C:\Users\46045\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll Traceback (most recent call last): File "F:\新建文件夹\Llama2-Chinese\examples\chat_gradio.py", line 86, in model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path,device_map='auto',torch_dtype=torch.float16,load_in_8bit=True) File "C:\Users\46045\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\auto_factory.py", line 516, in from_pretrained return model_class.from_pretrained( File "C:\Users\46045\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 3030, in from_pretrained raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

ZHangZHengEric commented 10 months ago

应该是GPU 的显存不够吧

fenfenyangyangmate commented 10 months ago

应该是GPU的显存不足吧

好像是的,(;´༎ຶД༎ຶ`) 

ikun52099 commented 6 months ago

请问你有Llama2-Chinese-7b-Chat的vocab.txt文件吗

ikun52099 commented 6 months ago

或者vocab.json,tokenizer_config.json结合merges.txt这些文件都可以