Closed sayli-ds closed 6 months ago
Hello, Llama 70b is not well-supported by transformers-neuronx at the moment. We expect to release an example notebook for this model in a future release.
https://aws.amazon.com/about-aws/whats-new/2023/10/aws-neuron-support-llama-pytorch/
Above page stated that AWS Neuron adds support for Llama-2 70b model
We support llama-70B for training here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama2_70b.html#llama2-70b-tp-pp-tutorial . For inference, we are still working on onboarding the model and will release it in our future releases.
Approximate release date for supporting inference?
Hi sayli-ds - just an update. We are still working on 70B inference (along with many other models). We expect it to be supported within the next 1-2 releases. When available, it will be posted on our inference Sample/Tutorials page at https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/models/inference-inf2-trn1-samples.html#model-samples-inference-inf2-trn1
Could successfully compile llama2 7B on neuronx.
Referring to this notebook- https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb
But for llama2 70B, getting this error for LlamaForSampling.from_pretrained step-
ValueError: Weight with shape torch.Size([8192, 1024]) cannot be sharded along dimension 1. This results in 21 weight partitions which cannot be distributed to 20 NeuronCores evenly. To fix this issue either the model parameters or the
tp_degree
must be changed to allow the weight to be evenly splitDownloaded "meta-llama/Llama-2-70b-hf" model from huggingface and pointing to directory 'models--meta-llama--Llama-2-70b-hf/snapshots/' for the above step.