aws-neuron / neuronx-distributed

MIT No Attribution
30 stars 5 forks source link

The Llama inference examples needs to be updated to maintain parity with transformers==4.36 #14

Open sol0invictus opened 5 months ago

sol0invictus commented 5 months ago

https://github.com/aws-neuron/neuronx-distributed/blob/a80091de6c9d8eb75f96a7367e143a81d586fbbc/examples/inference/llama2/neuron_modeling_llama.py#L36

The llama inference example needs to be updated because transformers==4.36 now needs an additional argument layer_idx in the LlamaDecoderLayer class. https://github.com/huggingface/transformers/blob/v4.37.0/src/transformers/models/llama/modeling_llama.py#L754

jluntamazon commented 5 months ago

Hi @sol0invictus,

Thank you for the code reference! We reproduced the problem here and are intending to release a fix for this example in the upcoming release

As an immediate solution, you can either downgrade the transformers version or update the code to handle the layer_idx argument.