Open mahendra-paranjpe opened 9 months ago
@mahendra-paranjpe the most common reason for this error:
tdrv_get_dev_info No neuron device available
is not running on the right instance type. Are you running on inf2 ?
yes. it is inf2.48xlarge.
Hi @mahendra-paranjpe,
This can indicate that installation is not complete - e.g. missing drivers. Please check: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html. Note the system packages (rpm/dpkg) files for installation (e.g. https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22 "Drivers and Tools"), and that you are running on one of the supported OS versions.
If you think that is installed correctly - it is possible the driver is not correctly loaded for some reason. Try:
sudo modprobe neuron
... then retry your test. If neither of those works please post back here.
Hi @mahendra-paranjpe - haven't heard back whether mrnikwaws comments solved your ticket. Closing this out for now. If you are still encountering a problem please reopen or create a new ticket.
Running notebook - https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb on inf2.48xlarge
Error while running last block - line no 4 from transformers_neuronx.llama.model import LlamaForSampling
results in:
https://github.com/huggingface/optimum-neuron/issues/213 - This suggests to update latest version of torch-neuronx. And https://github.com/aws-neuron/transformers-neuronx/issues/33 this suggest specific to torch-neuronx-1.13.1.1.10.0
When tried installing the specific version, it failed with following exception.
Additional info on different versions available as of now.
Following packages are installed