Llama 2 70 B on neurons

aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications

Other

101 stars 32 forks source link

Llama 2 70 B on neurons #54

Closed sayli-ds closed 6 months ago

sayli-ds commented 7 months ago

Could successfully compile llama2 7B on neuronx.

Referring to this notebook- https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb

But for llama2 70B, getting this error for LlamaForSampling.from_pretrained step-

ValueError: Weight with shape torch.Size([8192, 1024]) cannot be sharded along dimension 1. This results in 21 weight partitions which cannot be distributed to 20 NeuronCores evenly. To fix this issue either the model parameters or the tp_degree must be changed to allow the weight to be evenly split

Downloaded "meta-llama/Llama-2-70b-hf" model from huggingface and pointing to directory 'models--meta-llama--Llama-2-70b-hf/snapshots/' for the above step.

aws-rhsoln commented 7 months ago

Hello, Llama 70b is not well-supported by transformers-neuronx at the moment. We expect to release an example notebook for this model in a future release.

sayli-ds commented 7 months ago

https://aws.amazon.com/about-aws/whats-new/2023/10/aws-neuron-support-llama-pytorch/

Above page stated that AWS Neuron adds support for Llama-2 70b model

aws-rhsoln commented 7 months ago

We support llama-70B for training here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama2_70b.html#llama2-70b-tp-pp-tutorial . For inference, we are still working on onboarding the model and will release it in our future releases.

sayli-ds commented 7 months ago

Approximate release date for supporting inference?

aws-donkrets commented 6 months ago

Hi sayli-ds - just an update. We are still working on 70B inference (along with many other models). We expect it to be supported within the next 1-2 releases. When available, it will be posted on our inference Sample/Tutorials page at https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/models/inference-inf2-trn1-samples.html#model-samples-inference-inf2-trn1