Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
I am trying to run a fine-tuned version of llama 2 on inf2 , but keep getting an AssertionError: Try to load with neff bytes as None, might due to compilation failure
I Used an instance is inf2.x8large, Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20230817, all upgraded as described here.
Here's the full log:
(aws_neuron_venv_pytorch) $ python
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import LlamaForCausalLM
>>>
>>> model = LlamaForCausalLM.from_pretrained('llama-2-7b-hf')
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.62it/s]
>>> import torch
>>> from transformers_neuronx.module import save_pretrained_split
>>> save_pretrained_split(model, './llama-2-7b-split')
>>> quit()
(aws_neuron_venv_pytorch) $ python
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import time
>>> import torch
>>> from transformers import AutoTokenizer
>>> from transformers_neuronx.llama.model import LlamaForSampling
>>> os.environ["NEURON_CC_FLAGS"] = "--model-type=transformer-inference"
>>> neuron_model = LlamaForSampling.from_pretrained('./llama-2-7b-split', batch_size=1, tp_degree=2, amp='f16')
>>> neuron_model.to_neuron()
2023-Sep-08 23:03:12.0613 10779:10833 [0] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
2023-Sep-08 23:03:12.0613 10779:10833 [0] [init.cc:138](http://init.cc:138/) CCOM WARN OFI plugin initNet() failed is EFA enabled?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/llama/model.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/llama/model.py)", line 117, in to_neuron
model = self.decoder_lm_head.build_weight_shared(
File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 157, in build_weight_shared
new.program.setup(new.layers, ln_lm_head_params)
File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 983, in setup
super().setup(layers, ln_lm_head_params)
File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 879, in setup
kernel.load()
File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py)", line 375, in load
assert self.neff_bytes is not None, f"Try to load with neff bytes as None, might due to compilation failure"
AssertionError: Try to load with neff bytes as None, might due to compilation failure
I am trying to run a fine-tuned version of llama 2 on inf2 , but keep getting an
AssertionError: Try to load with neff bytes as None, might due to compilation failure
I Used an instance is inf2.x8large, Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20230817, all upgraded as described here.
Here's the full log:
Thanks!