My graphics card has insufficient memory. Can I use memory and graphics memory to run it?
My computer:
Window 10
GTX 1063
DDR4 24G
Run error message:
python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
gradio_launch(model_path=args.path, load_in_4bit=True)
File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
interpreter = StreamingLlamaCodeInterpreter(
File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
self.model = LlamaForCausalLM.from_pretrained(
File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 3119, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
model is Seungyoun/codellama-7b-instruct-pad
I have tried changing LlamaCodeInterpreter.py:79 to the following code, but encountered an error when running it:TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'
python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
gradio_launch(model_path=args.path, load_in_4bit=True)
File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
interpreter = StreamingLlamaCodeInterpreter(
File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
self.model = LlamaForCausalLM.from_pretrained(
File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 2959, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'
My graphics card has insufficient memory. Can I use memory and graphics memory to run it? My computer:
Run error message:
model is Seungyoun/codellama-7b-instruct-pad
I have tried changing LlamaCodeInterpreter.py:79 to the following code, but encountered an error when running it:
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'
Complete operation log: