AssertionError When Running Fine-Tuned LLaMA 2

eladspi commented 12 months ago

I am trying to run a fine-tuned version of llama 2 on inf2 , but keep getting an AssertionError: Try to load with neff bytes as None, might due to compilation failure

I Used an instance is inf2.x8large, Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20230817, all upgraded as described here.

Here's the full log:

(aws_neuron_venv_pytorch) $ python
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import LlamaForCausalLM
>>>
>>> model = LlamaForCausalLM.from_pretrained('llama-2-7b-hf')
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.62it/s]
>>> import torch
>>> from transformers_neuronx.module import save_pretrained_split
>>> save_pretrained_split(model, './llama-2-7b-split')
>>> quit()

(aws_neuron_venv_pytorch) $ python
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import time
>>> import torch
>>> from transformers import AutoTokenizer
>>> from transformers_neuronx.llama.model import LlamaForSampling
>>> os.environ["NEURON_CC_FLAGS"] = "--model-type=transformer-inference"
>>> neuron_model = LlamaForSampling.from_pretrained('./llama-2-7b-split', batch_size=1, tp_degree=2, amp='f16')
>>> neuron_model.to_neuron()
2023-Sep-08 23:03:12.0613 10779:10833 [0] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
2023-Sep-08 23:03:12.0613 10779:10833 [0] [init.cc:138](http://init.cc:138/) CCOM WARN OFI plugin initNet() failed is EFA enabled?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/llama/model.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/llama/model.py)", line 117, in to_neuron
    model = self.decoder_lm_head.build_weight_shared(
  File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 157, in build_weight_shared
    new.program.setup(new.layers, ln_lm_head_params)
  File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 983, in setup
    super().setup(layers, ln_lm_head_params)
  File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py)", line 879, in setup
    kernel.load()
  File "/[opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py](http://opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py)", line 375, in load
    assert self.neff_bytes is not None, f"Try to load with neff bytes as None, might due to compilation failure"
AssertionError: Try to load with neff bytes as None, might due to compilation failure

Thanks!

aws-rhsoln commented 11 months ago

Thank you for reporting the issue. We are able to reproduce the issue and are working on a fix.

micwade-aws commented 10 months ago

Duplicate with https://github.com/aws-neuron/transformers-neuronx/issues/40

aws-neuron / aws-neuron-sdk

AssertionError When Running Fine-Tuned LLaMA 2 #738