SeungyounShin / Llama2-Code-Interpreter

Make Llama2 use Code Execution, Debug, Save Code, Reuse it, Access to Internet
687 stars 89 forks source link

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? #32

Open Jackxwb opened 2 months ago

Jackxwb commented 2 months ago

My graphics card has insufficient memory. Can I use memory and graphics memory to run it? My computer:

Run error message:

python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
  chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
    gradio_launch(model_path=args.path, load_in_4bit=True)
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
    interpreter = StreamingLlamaCodeInterpreter(
  File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
    self.model = LlamaForCausalLM.from_pretrained(
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 3119, in from_pretrained
    raise ValueError(
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.

model is Seungyoun/codellama-7b-instruct-pad

I have tried changing LlamaCodeInterpreter.py:79 to the following code, but encountered an error when running it:TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'

self.model = LlamaForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
            load_in_4bit=load_in_4bit,
            load_in_8bit=load_in_8bit,
            torch_dtype=torch.float16,
            load_in_8bit_fp32_cpu_offload=True,
        )

Complete operation log:

python chatbot.py --path V:\codellama-7b-instruct-pad
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\gradio_client\documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
  warnings.warn(f"Could not get documentation group for {cls}: {exc}")
D:\AI\Llama2-Code-Interpreter\chatbot.py:104: GradioUnusedKwargWarning: You have unused kwarg parameters in Chatbot, please remove them: {'avatar_images': './assets/logo2.png'}
  chatbot = gr.Chatbot(height=820, avatar_images="./assets/logo2.png")
Traceback (most recent call last):
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 238, in <module>
    gradio_launch(model_path=args.path, load_in_4bit=True)
  File "D:\AI\Llama2-Code-Interpreter\chatbot.py", line 108, in gradio_launch
    interpreter = StreamingLlamaCodeInterpreter(
  File "D:\AI\Llama2-Code-Interpreter\code_interpreter\LlamaCodeInterpreter.py", line 79, in __init__
    self.model = LlamaForCausalLM.from_pretrained(
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\modeling_utils.py", line 2959, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_8bit_fp32_cpu_offload'