Error when executing `neuron_model.to_neuron()`

massi-ang commented 9 months ago

Running the notebook https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb on inf2.48x

Getting this error when executing the last cell of the notebook

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 neuron_model.to_neuron()

File ~/aws_neuron_venv_pytorch/lib/python3.10/site-packages/transformers_neuronx/llama/model.py:117, in LlamaForSampling.to_neuron(self)
    115 self.decoder_lm_head_for_context = {}
    116 for context_length_estimate in self.context_buckets:
--> 117     model = self.decoder_lm_head.build_weight_shared(
    118         n_positions_list=[context_length_estimate],
    119         n_active_tokens=context_length_estimate,
    120         unroll=self.context_unroll,
    121         share_caches=True,
    122     )
    123     # PERF: No latency improvement seen in multi-layer models from executor
    124     if self.context_unroll == self.config.num_hidden_layers:

File ~/aws_neuron_venv_pytorch/lib/python3.10/site-packages/transformers_neuronx/decoder.py:157, in DecoderLmHeadForSamplingNoEmbedding.build_weight_shared(self, n_positions_list, n_active_tokens, batch_size, unroll, share_caches)
    155     ln_lm_head_params.append(new.lm_head_bias)
    156 new.program = new._build_program()
--> 157 new.program.setup(new.layers, ln_lm_head_params)
    158 return new

File ~/aws_neuron_venv_pytorch/lib/python3.10/site-packages/transformers_neuronx/decoder.py:983, in DecoderProgramFullyUnrolled.setup(self, layers, ln_lm_head_params)
    982 def setup(self, layers, ln_lm_head_params):
--> 983     super().setup(layers, ln_lm_head_params)
    984     for npos, memory in zip(self.n_positions_list, self.memories):
    985         input_tensors = [*self.input_buffers]

File ~/aws_neuron_venv_pytorch/lib/python3.10/site-packages/transformers_neuronx/decoder.py:879, in DecoderProgram.setup(self, layers, ln_lm_head_params)
    876         kernel.neff_bytes = future.result()
    878 for kernel in self.kernels:
--> 879     kernel.load()

File ~/aws_neuron_venv_pytorch/lib/python3.10/site-packages/transformers_neuronx/compiler.py:375, in ParallelKernel.load(self)
    374 def load(self):
--> 375     assert self.neff_bytes is not None, f"Try to load with neff bytes as None, might due to compilation failure"
    376     self.model = torch.classes.neuron.ParallelModel(self.neff_bytes, self.tp_degree, self.g_start_device_id, self.g_device_count)
    377     self.model.load()

AssertionError: Try to load with neff bytes as None, might due to compilation failure

awsilya commented 9 months ago

@massi-ang thank you for the report. I'm trying to reproduce the failure.

awsilya commented 9 months ago

@massi-ang we confirmed that there is a bug in the currently released version of Neuron. The fix will be available shortly.

awsilya commented 9 months ago

@massi-ang could you try with just released 2.14.1 ? https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#neuron-2-14-1-09-26-2023

mrnikwaws commented 8 months ago

Hi @massi-ang,

Since it has been over a month since the last comments I am closing this issue. Please open a new one if the problem persists with the current release.

massi-ang commented 8 months ago

I have not had time to retest.

aws-neuron / aws-neuron-samples

Error when executing `neuron_model.to_neuron()` #46