[Bug Report] convert_llama_weights fails if I already quantized the weights to 4 bits

abdurraheemali commented 9 months ago

https://github.com/neelnanda-io/TransformerLens/blob/ce82675a8e89b6d5e6229a89620c843c794f3b04/transformer_lens/loading_from_pretrained.py#L1395C7-L1395C69

runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04

!pip install torch einops transformer_lens plotly circuitsvis numpy transformers==4.30.0 sentencepiece scikit-learn jaxtyping pytest bitsandbytes accelerate

import gc import os import torch import numpy as np import einops import transformer_lens import functools import plotly.graph_objects as go import plotly.express as px import circuitsvis as cv import tqdm import json

import transformers from transformers import BitsAndBytesConfig from transformers import AutoTokenizer, AutoModelForCausalLM from transformer_lens import ActivationCache, HookedTransformer from transformer_lens import utils as tl_utils from transformer_lens.hook_points import HookPoint from torch import Tensor, bfloat16 from torch.utils.data import Dataset from jaxtyping import Int, Float from typing import Union, Tuple, List from sklearn.decomposition import PCA

from instruction_dataset import InstructionDataset, PairedInstructionDataset

device = "cuda" if torch.cuda.is_available() else "cpu" model_name_path = "meta-llama/Llama-2-13b-chat-hf"

bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16, )

tokenizer = AutoTokenizer.from_pretrained(

model_name_path, token=os.environ["HF_TOKEN"], use_fast=False

)

tokenizer.pad_token = tokenizer.unk_token

tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained( model_name_path, torch_dtype=torch.bfloat16, quantization_config=bnb_config, )

EinopsError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/einops/einops.py:523, in reduce(tensor, pattern, reduction, **axes_lengths) 522 recipe = _prepare_transformation_recipe(pattern, reduction, axes_names=tuple(axes_lengths), ndim=len(shape)) --> 523 return _apply_recipe( 524 backend, recipe, cast(Tensor, tensor), reduction_type=reduction, axes_lengths=hashable_axes_lengths 525 ) 526 except EinopsError as e:

File /usr/local/lib/python3.10/dist-packages/einops/einops.py:234, in _apply_recipe(backend, recipe, tensor, reduction_type, axes_lengths) 233 try: --> 234 init_shapes, axes_reordering, reduced_axes, added_axes, final_shapes, n_axes_w_added = _reconstruct_from_shape( 235 recipe, backend.shape(tensor), axes_lengths 236 ) 237 except TypeError: 238 # shape or one of passed axes lengths is not hashable (i.e. they are symbols)

File /usr/local/lib/python3.10/dist-packages/einops/einops.py:187, in _reconstruct_from_shape_uncached(self, shape, axes_dims) 186 if isinstance(length, int) and isinstance(known_product, int) and length % known_product != 0: --> 187 raise EinopsError(f"Shape mismatch, can't divide axis of length {length} in chunks of {known_product}") 189 unknown_axis = unknown_axes[0]

EinopsError: Shape mismatch, can't divide axis of length 1 in chunks of 40

During handling of the above exception, another exception occurred:

EinopsError Traceback (most recent call last) Cell In[18], line 1 ----> 1 tl_model = HookedTransformer.from_pretrained( 2 3 model_name_path, 4 5 hf_model=model, 6 7 device="cuda", 8 9 fold_ln=False, 10 11 center_writing_weights=False, 12 13 center_unembed=False, 14 15 tokenizer=tokenizer, 16 17 default_padding_side="left", 18 19 dtype="bfloat16", 20 21 ).to(device) 24 torch.set_grad_enabled(False)

File /usr/local/lib/python3.10/dist-packages/transformer_lens/HookedTransformer.py:1282, in HookedTransformer.from_pretrained(cls, model_name, fold_ln, center_writing_weights, center_unembed, refactor_factored_attn_matrices, checkpoint_index, checkpoint_value, hf_model, device, n_devices, tokenizer, move_to_device, fold_value_biases, default_prepend_bos, default_padding_side, dtype, from_pretrained_kwargs) 1278 center_writing_weights = False 1280 # Get the state dict of the model (ie a mapping of parameter names to tensors), processed to 1281 # match the HookedTransformer parameter names. -> 1282 state_dict = loading.get_pretrained_state_dict( 1283 official_model_name, cfg, hf_model, dtype=dtype, from_pretrained_kwargs 1284 ) 1286 # Create the HookedTransformer object 1287 model = cls( 1288 cfg, 1289 tokenizer, 1290 move_to_device=False, 1291 default_padding_side=default_padding_side, 1292 )

File /usr/local/lib/python3.10/dist-packages/transformer_lens/loading_from_pretrained.py:1087, in get_pretrained_state_dict(official_model_name, cfg, hf_model, dtype, **kwargs) 1085 state_dict = convert_neox_weights(hf_model, cfg) 1086 elif cfg.original_architecture == "LlamaForCausalLM": -> 1087 state_dict = convert_llama_weights(hf_model, cfg) 1088 elif cfg.original_architecture == "BertForMaskedLM": 1089 state_dict = convert_bert_weights(hf_model, cfg)

File /usr/local/lib/python3.10/dist-packages/transformer_lens/loading_from_pretrained.py:1395, in convert_llama_weights(llama, cfg) 1390 state_dict[f"blocks.{l}.attn.b_V"] = torch.zeros( 1391 cfg.n_heads, cfg.d_head, dtype=cfg.dtype 1392 ) 1394 W_O = llama.model.layers[l].self_attn.o_proj.weight -> 1395 W_O = einops.rearrange(W_O, "m (n h)->n h m", n=cfg.n_heads) 1396 state_dict[f"blocks.{l}.attn.W_O"] = W_O 1398 state_dict[f"blocks.{l}.attn.b_O"] = torch.zeros(cfg.d_model, dtype=cfg.dtype)

File /usr/local/lib/python3.10/dist-packages/einops/einops.py:591, in rearrange(tensor, pattern, axes_lengths) 536 def rearrange(tensor: Union[Tensor, List[Tensor]], pattern: str, axes_lengths) -> Tensor: 537 """ 538 einops.rearrange is a reader-friendly smart element reordering for multidimensional tensors. 539 This operation includes functionality of transpose (axes permutation), reshape (view), squeeze, unsqueeze, (...) 589 590 """ --> 591 return reduce(tensor, pattern, reduction="rearrange", **axes_lengths)

File /usr/local/lib/python3.10/dist-packages/einops/einops.py:533, in reduce(tensor, pattern, reduction, **axes_lengths) 531 message += "\n Input is list. " 532 message += "Additional info: {}.".format(axes_lengths) --> 533 raise EinopsError(message + "\n {}".format(e))

EinopsError: Error while processing rearrange-reduction pattern "m (n h)->n h m". Input tensor shape: torch.Size([13107200, 1]). Additional info: {'n': 40}. Shape mismatch, can't divide axis of length 1 in chunks of 40

abdurraheemali commented 9 months ago

The error originates from an incorrect use of the einops.rearrange function in the convert_llama_weights function from the transformer_lens package. Specifically, the error message "Shape mismatch, can't divide axis of length 1 in chunks of 40" indicates a mismatch between the expected tensor shape and the actual tensor shape during the rearrangement operation.

In the line:

W_O = einops.rearrange(W_O, "m (n h)->n h m", n=cfg.n_heads)

The tensor W_O is expected to have a shape that can be divided into chunks of size n * h (n being the number of heads, as specified by cfg.n_heads). However, the actual shape of W_O is torch.Size([13107200, 1]), which cannot be divided as expected.

To resolve this issue:

Verify that the tensor W_O has the correct shape before the rearrange operation. The shape should be compatible with the rearrange pattern "m (n h)->n h m". Ensure that cfg.n_heads is correctly set and matches the expected architecture of the model. If W_O is supposed to have a different shape, you may need to adjust the model's architecture or the way weights are loaded or transformed to match the expected dimensions.

abdurraheemali commented 8 months ago

this may be addressed by https://github.com/neelnanda-io/TransformerLens/pull/486. I'll check it out.

TransformerLensOrg / TransformerLens

[Bug Report] convert_llama_weights fails if I already quantized the weights to 4 bits #470