AttributeError: '_OpNamespace' 'aten' object has no attribute '_kai_weight_pack_int4'

Hi,

I am following the article at https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/ but at step

python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --quantize config/data/aarch64_cpu_channelwise.json --device cpu --max-seq-length 1024

I get the following error:

Converting meta-llama/Meta-Llama-3.1-8B-Instruct to torchchat format...
known configs: ['stories15M', '13B', 'CodeLlama-7b-Python-hf', 'stories42M', 'Mistral-7B', '34B', 'Meta-Llama-3-8B', 'Meta-Llama-3.1-8B', '30B', '7B', 'stories110M', 'Meta-Llama-3-70B', 'Meta-Llama-3.1-70B', '70B']
Model config {'block_size': 2048, 'vocab_size': 128256, 'n_layers': 32, 'n_heads': 32, 'dim': 4096, 'hidden_dim': 14336, 'n_local_heads': 8, 'head_dim': 128, 'rope_base': 500000.0, 'norm_eps': 1e-05, 'multiple_of': 1024, 'ffn_dim_multiplier': 1.3, 'use_tiktoken': True, 'max_seq_length': 8192, 'use_scaled_rope': True}
Moving checkpoint to /home/torch/.torchchat/model-cache/downloads/meta-llama/Meta-Llama-3.1-8B-Instruct/model.pth.
Done.
Moving model to /home/torch/.torchchat/model-cache/meta-llama/Meta-Llama-3.1-8B-Instruct.
Note: NumExpr detected 32 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240814 available.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
lm_eval is not installed, GPTQ may not be usable
Using device=cpu
Loading model...
Time to load model: 1.88 seconds
Quantizing the model with: {'executor': {'accelerator': 'cpu'}, 'precision': {'dtype': 'fp32'}, 'linear:int4': {'groupsize': 0, 'scheme': 'symmetric_channelwise'}}
linear: layers.0.attention.wq, in=4096, out=4096
Time to quantize model: 0.05 seconds
Traceback (most recent call last):
  File "/home/torch/pytorch/torchchat/torchchat.py", line 97, in <module>
    export_main(args)
  File "/home/torch/pytorch/torchchat/export.py", line 124, in main
    model = _initialize_model(
            ^^^^^^^^^^^^^^^^^^
  File "/home/torch/pytorch/torchchat/build/builder.py", line 514, in _initialize_model
    quantize_model(
  File "/home/torch/pytorch/torchchat/quantization/quantize.py", line 109, in quantize_model
    model = quant_handler.quantize(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torchao-0.4.0+git174e630a-py3.11-linux-aarch64.egg/torchao/quantization/GPTQ.py", line 809, in quantize
    state_dict = self._create_quantized_state_dict(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torchao-0.4.0+git174e630a-py3.11-linux-aarch64.egg/torchao/quantization/GPTQ.py", line 774, in _create_quantized_state_dict
    weight_int4pack = torch.ops.aten._kai_weight_pack_int4(w_int4x8.to(self.device),scales_and_zeros,mod.out_features,mod.in_features,0)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torch/_ops.py", line 1222, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'aten' object has no attribute '_kai_weight_pack_int4'

I see this attribute comes from https://github.com/ArmDeveloperEcosystem/PyTorch-arm-patches/blob/main/0001-Feat-Add-support-for-kleidiai-quantization-schemes.patch

 conda list | grep -i torch                                                                                                                                             (torch_env) 
# packages in environment at /home/torch/miniforge3/envs/torch_env:
torch                     2.5.0.dev20240814          pypi_0    pypi
torchao                   0.4.0+git174e630a          pypi_0    pypi

Any ideas ?

Thanks!

Hi @martin-g it appears you haven't installed the torch wheel provided as per the learning path

You will need to override the Pytorch version that gets installed with a specific version to take advantage of the KleidiAI optimizations.

wget https://github.com/ArmDeveloperEcosystem/PyTorch-arm-patches/raw/main/torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl pip install --force-reinstall torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Then you can validate:

pip list | grep -i torch
torch                     2.5.0.dev20240828+cpu
torchao                   0.4.0+git174e630a`

Hope this resolves your issue

Hi @pareenaverma !

Thank you for helping me!

torch@host-192-168-1-5 ~> pip install --force-reinstall torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl                                                         (torch_env) 
DEPRECATION: Loading egg at /home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/torchao-0.4.0+git174e630a-py3.11-linux-aarch64.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
ERROR: torch-2.5.0.dev20240828+cpu-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl is not a supported wheel on this platform.
torch@host-192-168-1-5 ~ [1]> uname -m                                                                                                                                                                   (torch_env) 
aarch64
torch@host-192-168-1-5 ~> arch                                                                                                                                                                           (torch_env) 
aarch64
torch@host-192-168-1-5 ~> lscpu                                                                                                                                                                          (torch_env) 
Architecture:           aarch64
  CPU op-mode(s):       64-bit
  Byte Order:           Little Endian
CPU(s):                 32
  On-line CPU(s) list:  0-31
Vendor ID:              HiSilicon
  Model name:           Kunpeng-920
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 16
    Socket(s):          2
    Stepping:           0x1
    Frequency boost:    disabled
    CPU max MHz:        2400.0000
    CPU min MHz:        2400.0000
    BogoMIPS:           200.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
Caches (sum of all):    
  L1d:                  2 MiB (32 instances)
  L1i:                  2 MiB (32 instances)
  L2:                   16 MiB (32 instances)
  L3:                   64 MiB (2 instances)
NUMA:                   
  NUMA node(s):         2
  NUMA node0 CPU(s):    0-15
  NUMA node1 CPU(s):    16-31
Vulnerabilities:        
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Not affected
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected

torch@host-192-168-1-5 ~ [1]>  which pip                                                                                                                                                                  (torch_env) 
/home/torch/miniforge3/envs/torch_env/bin/pip

torch@host-192-168-1-5 ~> pip --version                                                                                                                                                                  (torch_env) 
pip 24.2 from /home/torch/miniforge3/envs/torch_env/lib/python3.11/site-packages/pip (python 3.11)

I think I realize the problem! I must use 3.10, not 3.10+ ...

Thanks! It works now! I was able to build the model and ask questions !

Note to future me: I had to install gperftools-libs instead of google-perftools for my Linux distro (openEuler 22.03), to have /usr/lib64/libtcmalloc.so.4.

One further question: At https://gitlab.arm.com/kleidi/kleidiai/-/issues/2 we needed to make some improvements to KleidiAI to be able to use it on CPUs which do not support all CPU features. How to make sure that these improvements are being used ? How to use latest KleidiAI (built locally) ?

ArmDeveloperEcosystem / PyTorch-arm-patches

AttributeError: '_OpNamespace' 'aten' object has no attribute '_kai_weight_pack_int4' #1