aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

Downloaded model structure does not correspond to the notebook instructions #38

Closed massi-ang closed 1 month ago

massi-ang commented 9 months ago

Trying to execute: https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb

After cloning the LLama-13-b repo from Huggingface, I get the following content

image

config.json is missing the the code is complaining about it.

massi-ang commented 9 months ago

The notebook instructions state:

After gaining access to the model checkpoints, you should be able to use the already converted checkpoints.

This does not seem to be the case. I tried running python src/transformers/models/llama/convert_llama_weights_to_hf.py on the downloaded folder but the process quits after few seconds with

Killed

Running on inf2.xlarge with 256Gb EBS

massi-ang commented 9 months ago

The instruction in the notebook state:

Follow the steps described in meta-llama/Llama-2-13b to get access to the Llama 2 model from Meta and download the weights and tokenizer.

The get the ready to use models, one need to get the -hf versions, ie Llama-2-13b-hf

Update the notebook instructions accordingly

aws-rhsoln commented 9 months ago

Thank you for reporting the issue. After looking into the issues, there are two things that needs attention:

  1. You are trying to load a 13B model on inf.xl which does not have enough host/device memory to load/run the model. Hence the program gets killed. This tutorial requires one to run on inf2.48xl or a trn1.32xl.
  2. Regarding the config.json, it looks like you may have downloaded the wrong artifacts. Please check the download model section in the notebook where it points to the HF repo from which the model needs to be downloaded. For more information about accessing the LLaMA V2 checkpoints, please see https://huggingface.co/docs/transformers/main/model_doc/llama#overview
massi-ang commented 9 months ago

You are trying to load a 13B model on inf.xl which does not have enough host/device memory to load/run the model. Hence the program gets killed. This tutorial requires one to run on inf2.48xl or a trn1.32xl.

Where is that written?

Regarding the config.json, it looks like you may have downloaded the wrong artifacts. Please check the download model section in the notebook where it points to the HF repo from which the model needs to be downloaded

The notebook points to the normal version and not the HF version. Can you point the line of the notebook where this is mentioned?

aws-rhsoln commented 1 month ago
  1. There is a line in the tutorial now that specifies the instance to use

    This Jupyter Notebook can be run on an Inf2 instance (inf2.48xlarge) or Trn1 instance (trn1.32xlarge).
  2. It points to the HF link where there are instructions on how to download the model: https://huggingface.co/meta-llama/Llama-2-13b

Closing the issue now. Please re-open if the issue still exists.