Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.68k stars 170 forks source link

NonDynamicallyQuantizableLinear object has no attribute 'weight' #97

Closed Keeo closed 9 months ago

Keeo commented 10 months ago

Hi Team, wanted to try Sphinx out but got following error: AttributeError: 'NonDynamicallyQuantizableLinear' object has no attribute 'weight'. Any idea what could be wrong?

I am running this on 4090 in up to date Manjaro and I checked hashes of the downloaded files.

# I started by clonning the repo
git clone **https://github.com/Alpha-VLLM/LLaMA2-Accessory.git

# Then created conda environment
conda create -n accessory python=3.10 -y
conda activate accessory
conda install -c conda-forge cudatoolkit=11.7.0 -y

# Modified requirements.txt to lock gradio to gradio==3.48.0
# Installed dependencies
pip install -r requirements.txt
pip install scipy
pip install "git+https://github.com/facebookresearch/segment-anything.git"

# Downloaded sam_vit_h_4b8939.pth
https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth

# Downloaded tokenizer
https://huggingface.co/Alpha-VLLM/LLaMA2-Accessory/blob/main/config/tokenizer.model

# Downloaded SPINHX
https://huggingface.co/Alpha-VLLM/LLaMA2-Accessory/tree/main/finetune/mm/SPHINX/SPHINX

# And finally run the whole thing
python demos/multi_turn_mm_box.py --n_gpus=1 --tokenizer_path=~/Downloads/tokenizer.model --llama_type=llama_ens --pretrained_path ~/Downloads/Sphinx/ --quant

This boots up and starts gradio, there I select image, provide prompt and pres Enter.

###Human: What's in the image?
###Assistant:
Process Process-1:
Traceback (most recent call last):
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/keo/projects/sphinx/accessory/demos/multi_turn_mm_box.py", line 112, in model_worker
    for stream_response in model.stream_generate(
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/home/keo/projects/sphinx/accessory/model/meta.py", line 202, in stream_generate
    logits = self.llma.forward_inference(tokens[None, prev_pos:cur_pos], prev_pos, images if prev_pos == 0 else None)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 481, in forward_inference
    image_tokens = self.encode_image(image)
  File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 395, in encode_image
    local_clip_image_feats = self.clip_encode_image(local_image)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 363, in clip_encode_image
    x = self.clip.visual.transformer(x)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 324, in forward
    x = r(x, attn_mask=attn_mask)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 241, in forward
    x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 227, in attention
    return self.attn(
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1209, in forward
    self.dropout, self.out_proj.weight, self.out_proj.bias,
  File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'NonDynamicallyQuantizableLinear' object has no attribute 'weight'

Please expand the details to see the full log from the terminal:

``` python demos/multi_turn_mm_box.py --n_gpus=1 --tokenizer_path=/home/keo/Downloads/tokenizer.model --llama_type=llama_ens --pretrained_path ~/Downloads/model/ --quant /home/keo/projects/sphinx/accessory/model/components.py:8: UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation warnings.warn("Cannot import apex RMSNorm, switch to vanilla implementation") /home/keo/projects/sphinx/accessory/configs/global_configs.py:7: UserWarning: Cannot import flash_attn, switch to vanilla implementation. warnings.warn("Cannot import flash_attn, switch to vanilla implementation. ") /home/keo/projects/sphinx/accessory/model/components.py:8: UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation warnings.warn("Cannot import apex RMSNorm, switch to vanilla implementation") /home/keo/projects/sphinx/accessory/configs/global_configs.py:7: UserWarning: Cannot import flash_attn, switch to vanilla implementation. warnings.warn("Cannot import flash_attn, switch to vanilla implementation. ") | distributed init on worker 0/1. using gpu: 0 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 [21:08:27.287164] Model Args: ModelArgs(dim=5120, n_layers=40, n_heads=40, n_kv_heads=None, vocab_size=32000, multiple_of=256, ffn_dim_multiplier=None, norm_eps=1e-05, rope_theta=10000, max_batch_size=32, max_seq_len=4096, rope_scaling=None, load_pretrained_visual_encoder=False) [21:11:43.729310] rope theta: 10000 [21:11:43.735656] build llama model with qformerv2 [21:12:36.867326] build llama model with clip [21:12:39.818165] build llama model with openclip [21:12:52.348541] build llama model with dinov2 Using cache found in /home/keo/.cache/torch/hub/facebookresearch_dinov2_main /home/keo/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/swiglu_ffn.py:51: UserWarning: xFormers is not available (SwiGLU) warnings.warn("xFormers is not available (SwiGLU)") /home/keo/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/attention.py:33: UserWarning: xFormers is not available (Attention) warnings.warn("xFormers is not available (Attention)") /home/keo/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:40: UserWarning: xFormers is not available (Block) warnings.warn("xFormers is not available (Block)") [21:13:06.106184] Model is Peft: False [21:13:06.129312] Trainable parameter count : 13048673280 (local rank), 13048673280 (all). [21:13:06.129449] Loading pretrained weights ... [21:13:06.152651] Loading from checkpoint at: /home/keo/Downloads/model/ (1 of 1, format is "consolidated)" [21:17:11.328490] load result: {'missing_keys': [], 'unexpected_keys': []} [21:17:14.839748] Quantizing model to 4bit! Qunatization Process: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 847/847 [02:39<00:00, 5.32it/s] [21:21:21.908395] Model = MetaModel(...) Running on local URL: http://127.0.0.1:7860 Running on public URL: https://bca47c76de06629fdd.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces) [21:22:38.613525] A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ###Human: What's in the image? ###Assistant: Process Process-1: Traceback (most recent call last): File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/keo/projects/sphinx/accessory/demos/multi_turn_mm_box.py", line 112, in model_worker for stream_response in model.stream_generate( File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/keo/projects/sphinx/accessory/model/meta.py", line 202, in stream_generate logits = self.llma.forward_inference(tokens[None, prev_pos:cur_pos], prev_pos, images if prev_pos == 0 else None) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 481, in forward_inference image_tokens = self.encode_image(image) File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 395, in encode_image local_clip_image_feats = self.clip_encode_image(local_image) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/keo/projects/sphinx/accessory/model/LLM/llama_ens.py", line 363, in clip_encode_image x = self.clip.visual.transformer(x) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 324, in forward x = r(x, attn_mask=attn_mask) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 241, in forward x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask)) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/open_clip/transformer.py", line 227, in attention return self.attn( File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1209, in forward self.dropout, self.out_proj.weight, self.out_proj.bias, File "/home/keo/anaconda3/envs/accessory_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'NonDynamicallyQuantizableLinear' object has no attribute 'weight' ```
gaopengpjlab commented 10 months ago

Please turn off --quant for SPHINX. We are going to support quantized SPHINX in the future.

Keeo commented 10 months ago

Thanks, in the meantime is there anything else I can toggle to fit it in 24gb memory?

gaopengpjlab commented 9 months ago

@Keeo Please refer to the following issue for quantized SPHINX. https://github.com/Alpha-VLLM/LLaMA2-Accessory/issues/114