The Llama inference examples needs to be updated to maintain parity with transformers==4.36

aws-neuron / neuronx-distributed

MIT No Attribution

30 stars 5 forks source link

Open sol0invictus opened 5 months ago

sol0invictus commented 5 months ago

The llama inference example needs to be updated because transformers==4.36 now needs an additional argument layer_idx in the LlamaDecoderLayer class. https://github.com/huggingface/transformers/blob/v4.37.0/src/transformers/models/llama/modeling_llama.py#L754

jluntamazon commented 5 months ago

Hi @sol0invictus,

Thank you for the code reference! We reproduced the problem here and are intending to release a fix for this example in the upcoming release

As an immediate solution, you can either downgrade the transformers version or update the code to handle the layer_idx argument.