Cannot quantize to int8 - torch TypeError

AlpinDale commented 9 months ago

I'm trying to quantize Llama2 7b using the instructions in the readme, but get this:

start trans into int8, this might take a while
Instantiating Int8LlamaAttention without passing `layer_idx` is not recommended and will to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` when creating this class.
Traceback (most recent call last):
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 117, in <module>
    main()
  File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 112, in main
    int8_model = quant_model_class.from_float(model, decoder_layer_scales, quant_config)
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 245, in from_float
    int8_module.model = Int8LlamaModel.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 216, in from_float
    int8_module.layers[i] = Int8LlamaDecoderLayer.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 174, in from_float
    int8_module.input_layernorm = Int8LlamaRMSNorm.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 27, in from_float
    int8_module.weight = module.weight / output_scale
  File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1708, in __setattr__
    raise TypeError(f"cannot assign '{torch.typename(value)}' as parameter '{name}' "
TypeError: cannot assign 'torch.cuda.HalfTensor' as parameter 'weight' (torch.nn.Parameter or None expected)

The scales generate correctly.

AniZpZ commented 8 months ago

It might be caused by versions differences of depencies. You can try the following code in line 27, it probably can solve the problem.

int8_module.weight = torch.nn.Parameter(module.weight / output_scale)

We will work on this problem and fix it.

Hongbosherlock commented 8 months ago

It might be caused by versions differences of depencies. You can try the following code in line 27, it probably can solve the problem.
int8_module.weight = torch.nn.Parameter(module.weight / output_scale)
We will work on this problem and fix it.

What are the required versions of PyTorch, CUDA, and Transformers that we need?

AniZpZ / AutoSmoothQuant

Cannot quantize to int8 - torch TypeError #11