Open marina-pchelina opened 5 months ago
Hi @marina-pchelina
Since NeuronCores are reserved per process, it's possible that you have an old process which is holding onto the NeuronCores but has not been properly terminated. One thing to try is to forcefully stop all running processes: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/training-troubleshooting.html#neuroncore-s-not-available-requested-1-available-0
hi, thanks for getting back to me!
What I don't understand is how can neuron cores be holding on to any old process if a new inference instance is initialized each time I deploy. Anyway, I tried including some commands from the troubleshooting doc and some other I found in the top of my inference.py
script like so:
import subprocess
print("Running apt-get install kmod")
print(subprocess.run(['apt-get install kmod'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running lsmod | grep neuron")
print(subprocess.run(['lsmod | grep neuron'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running ps aux | grep python")
print(subprocess.run(['ps aux | grep python'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running neuron-ls")
print(subprocess.run(['neuron-ls'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
print("Running modinfo neuron")
print(subprocess.run(['modinfo neuron'], stdout=subprocess.PIPE, shell=True).stdout.decode('utf-8'))
I can see the 2 cores are there with neuron-ls
:
No significant python processes that could be using the cores, killing them all explicitly didn't help ether.
However, seems like I'm not able to use lsmod
or modinfo
, which I'm able to use and get output from from inside an EC2 instance (same inf2) directly. I tried installing them with apt-get install kmod
but that didn't help either.
Could that possibly have something to do with image that's used in the tutorial? It's currently this one: ecr_image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.13.2-ubuntu20.04"
@marina-pchelina, we were able to reproduce the problem in the tutorial and are looking into a fix.
The root cause appears to be 2 separate misconfigurations:
ml.inf2.xlarge
instance has only 2 NeuronCores, this is an invalid number of workers. You can observe the configuration in the beginning of the logs: Default workers per model: 4
NEURON_RT_NUM_CORES=1
so that it only takes ownership of a single NeuronCores for the model that it loads. You can see this in the logs because 4 warnings are issued for each model load followed by only 3 nrt_allocate_neuron_cores
errors showing that the NeuronCores have already been allocated to another process.thanks for looking into this!
I tried to re-compile the model with --target inf2
on the off chance it might help configure the num of workers, but it still showed Default workers per model: 4
in the logs.
If it's any help, I can deploy and use models through the HuggingFace integration, the problem with that is, I want to use both cores with DataParallel, which the HuggingFace class doesn't seem to allow to do.
Let me know if there's anything I can do myself to work around that, otherwise, I'll wait for a fix.
Hi, I'm following the sample here to try to compile a model to Neuron and deploy on SageMaker.
Following the steps in the sample exactly, I am able to deploy the model, but it when I try to use it I get the 500 error and my CloudWatch traceback shows the following:
I only saw this error when trying to use a second model in the same instance while another one is running, but that should not be the case here.